| # What Bootstrapping does |
| |
| <!-- toc --> |
| |
| [*Bootstrapping*][boot] is the process of using a compiler to compile itself. |
| More accurately, it means using an older compiler to compile a newer version of |
| the same compiler. |
| |
| This raises a chicken-and-egg paradox: where did the first compiler come from? |
| It must have been written in a different language. In Rust's case it was |
| [written in OCaml][ocaml-compiler]. However it was abandoned long ago and the |
| only way to build a modern version of `rustc` is a slightly less modern version. |
| |
| This is exactly how [`./x.py`] works: it downloads the current beta release of |
| `rustc`, then uses it to compile the new compiler. |
| |
| [`./x.py`]: https://github.com/rust-lang/rust/blob/master/x.py |
| |
| Note that this documentation mostly covers user-facing information. See |
| [bootstrap/README.md][bootstrap-internals] to read about bootstrap internals. |
| |
| [bootstrap-internals]: https://github.com/rust-lang/rust/blob/master/src/bootstrap/README.md |
| |
| ## Stages of bootstrapping |
| |
| ### Overview |
| |
| - Stage 0: the pre-compiled compiler |
| - Stage 1: from current code, by an earlier compiler |
| - Stage 2: the truly current compiler |
| - Stage 3: the same-result test |
| |
| Compiling `rustc` is done in stages. Here's a diagram, adapted from Jynn |
| Nelson's [talk on bootstrapping][rustconf22-talk] at RustConf 2022, with |
| detailed explanations below. |
| |
| The `A`, `B`, `C`, and `D` show the ordering of the stages of bootstrapping. |
| <span style="background-color: lightblue; color: black">Blue</span> nodes are |
| downloaded, <span style="background-color: yellow; color: black">yellow</span> |
| nodes are built with the `stage0` compiler, and <span style="background-color: |
| lightgreen; color: black">green</span> nodes are built with the `stage1` |
| compiler. |
| |
| [rustconf22-talk]: https://www.youtube.com/watch?v=oUIjG-y4zaA |
| |
| ```mermaid |
| graph TD |
| s0c["stage0 compiler (1.63)"]:::downloaded -->|A| s0l("stage0 std (1.64)"):::with-s0c; |
| s0c & s0l --- stepb[ ]:::empty; |
| stepb -->|B| s0ca["stage0 compiler artifacts (1.64)"]:::with-s0c; |
| s0ca -->|copy| s1c["stage1 compiler (1.64)"]:::with-s0c; |
| s1c -->|C| s1l("stage1 std (1.64)"):::with-s1c; |
| s1c & s1l --- stepd[ ]:::empty; |
| stepd -->|D| s1ca["stage1 compiler artifacts (1.64)"]:::with-s1c; |
| s1ca -->|copy| s2c["stage2 compiler"]:::with-s1c; |
| |
| classDef empty width:0px,height:0px; |
| classDef downloaded fill: lightblue; |
| classDef with-s0c fill: yellow; |
| classDef with-s1c fill: lightgreen; |
| ``` |
| |
| ### Stage 0: the pre-compiled compiler |
| |
| The stage0 compiler is usually the current _beta_ `rustc` compiler and its |
| associated dynamic libraries, which `./x.py` will download for you. (You can |
| also configure `./x.py` to use something else.) |
| |
| The stage0 compiler is then used only to compile [`src/bootstrap`], |
| [`library/std`], and [`compiler/rustc`]. When assembling the libraries and |
| binaries that will become the stage1 `rustc` compiler, the freshly compiled |
| `std` and `rustc` are used. There are two concepts at play here: a compiler |
| (with its set of dependencies) and its 'target' or 'object' libraries (`std` and |
| `rustc`). Both are staged, but in a staggered manner. |
| |
| [`compiler/rustc`]: https://github.com/rust-lang/rust/tree/master/compiler/rustc |
| [`library/std`]: https://github.com/rust-lang/rust/tree/master/library/std |
| [`src/bootstrap`]: https://github.com/rust-lang/rust/tree/master/src/bootstrap |
| |
| ### Stage 1: from current code, by an earlier compiler |
| |
| The rustc source code is then compiled with the `stage0` compiler to produce the |
| `stage1` compiler. |
| |
| ### Stage 2: the truly current compiler |
| |
| We then rebuild our `stage1` compiler with itself to produce the `stage2` |
| compiler. |
| |
| In theory, the `stage1` compiler is functionally identical to the `stage2` |
| compiler, but in practice there are subtle differences. In particular, the |
| `stage1` compiler itself was built by `stage0` and hence not by the source in |
| your working directory. This means that the ABI generated by the `stage0` |
| compiler may not match the ABI that would have been made by the `stage1` |
| compiler, which can cause problems for dynamic libraries, tests, and tools using |
| `rustc_private`. |
| |
| Note that the `proc_macro` crate avoids this issue with a `C` FFI layer called |
| `proc_macro::bridge`, allowing it to be used with `stage1`. |
| |
| The `stage2` compiler is the one distributed with `rustup` and all other install |
| methods. However, it takes a very long time to build because one must first |
| build the new compiler with an older compiler and then use that to build the new |
| compiler with itself. For development, you usually only want the `stage1` |
| compiler, which you can build with `./x build library`. See [Building the |
| compiler](../how-to-build-and-run.html#building-the-compiler). |
| |
| ### Stage 3: the same-result test |
| |
| Stage 3 is optional. To sanity check our new compiler we can build the libraries |
| with the `stage2` compiler. The result ought to be identical to before, unless |
| something has broken. |
| |
| ### Building the stages |
| |
| The script [`./x`] tries to be helpful and pick the stage you most likely meant |
| for each subcommand. These defaults are as follows: |
| |
| - `check`: `--stage 0` |
| - `doc`: `--stage 0` |
| - `build`: `--stage 1` |
| - `test`: `--stage 1` |
| - `dist`: `--stage 2` |
| - `install`: `--stage 2` |
| - `bench`: `--stage 2` |
| |
| You can always override the stage by passing `--stage N` explicitly. |
| |
| For more information about stages, [see |
| below](#understanding-stages-of-bootstrap). |
| |
| [`./x`]: https://github.com/rust-lang/rust/blob/master/x |
| |
| ## Complications of bootstrapping |
| |
| Since the build system uses the current beta compiler to build a `stage1` |
| bootstrapping compiler, the compiler source code can't use some features until |
| they reach beta (because otherwise the beta compiler doesn't support them). On |
| the other hand, for [compiler intrinsics][intrinsics] and internal features, the |
| features _have_ to be used. Additionally, the compiler makes heavy use of |
| `nightly` features (`#![feature(...)]`). How can we resolve this problem? |
| |
| There are two methods used: |
| |
| 1. The build system sets `--cfg bootstrap` when building with `stage0`, so we |
| can use `cfg(not(bootstrap))` to only use features when built with `stage1`. |
| Setting `--cfg bootstrap` in this way is used for features that were just |
| stabilized, which require `#![feature(...)]` when built with `stage0`, but |
| not for `stage1`. |
| 2. The build system sets `RUSTC_BOOTSTRAP=1`. This special variable means to |
| _break the stability guarantees_ of Rust: allowing use of `#![feature(...)]` |
| with a compiler that's not `nightly`. _Setting `RUSTC_BOOTSTRAP=1` should |
| never be used except when bootstrapping the compiler._ |
| |
| [boot]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) |
| [intrinsics]: ../../appendix/glossary.md#intrinsic |
| [ocaml-compiler]: https://github.com/rust-lang/rust/tree/ef75860a0a72f79f97216f8aaa5b388d98da6480/src/boot |
| |
| ## Understanding stages of bootstrap |
| |
| ### Overview |
| |
| This is a detailed look into the separate bootstrap stages. |
| |
| The convention `./x` uses is that: |
| |
| - A `--stage N` flag means to run the stage N compiler (`stageN/rustc`). |
| - A "stage N artifact" is a build artifact that is _produced_ by the stage N |
| compiler. |
| - The stage N+1 compiler is assembled from stage N *artifacts*. This process is |
| called _uplifting_. |
| |
| #### Build artifacts |
| |
| Anything you can build with `./x` is a _build artifact_. Build artifacts |
| include, but are not limited to: |
| |
| - binaries, like `stage0-rustc/rustc-main` |
| - shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so` |
| - [rlib] files, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib` |
| - HTML files generated by rustdoc, like `doc/std` |
| |
| [rlib]: ../../serialization.md |
| |
| #### Examples |
| |
| - `./x test tests/ui` means to build the `stage1` compiler and run `compiletest` |
| on it. If you're working on the compiler, this is normally the test command |
| you want. |
| - `./x test --stage 0 library/std` means to run tests on the standard library |
| without building `rustc` from source ('build with `stage0`, then test the |
| artifacts'). If you're working on the standard library, this is normally the |
| test command you want. |
| - `./x build --stage 0` means to build with the beta `rustc`. |
| - `./x doc --stage 0` means to document using the beta `rustdoc`. |
| |
| #### Examples of what *not* to do |
| |
| - `./x test --stage 0 tests/ui` is not useful: it runs tests on the _beta_ |
| compiler and doesn't build `rustc` from source. Use `test tests/ui` instead, |
| which builds `stage1` from source. |
| - `./x test --stage 0 compiler/rustc` builds the compiler but runs no tests: |
| it's running `cargo test -p rustc`, but `cargo` doesn't understand Rust's |
| tests. You shouldn't need to use this, use `test` instead (without arguments). |
| - `./x build --stage 0 compiler/rustc` builds the compiler, but does not build |
| `libstd` or even `libcore`. Most of the time, you'll want `./x build library` |
| instead, which allows compiling programs without needing to define lang items. |
| |
| ### Building vs. running |
| |
| Note that `build --stage N compiler/rustc` **does not** build the stage N |
| compiler: instead it builds the stage N+1 compiler _using_ the stage N compiler. |
| |
| In short, _stage 0 uses the `stage0` compiler to create `stage0` artifacts which |
| will later be uplifted to be the stage1 compiler_. |
| |
| In each stage, two major steps are performed: |
| |
| 1. `std` is compiled by the stage N compiler. |
| 2. That `std` is linked to programs built by the stage N compiler, including the |
| stage N artifacts (stage N+1 compiler). |
| |
| This is somewhat intuitive if one thinks of the stage N artifacts as "just" |
| another program we are building with the stage N compiler: `build --stage N |
| compiler/rustc` is linking the stage N artifacts to the `std` built by the stage |
| N compiler. |
| |
| ### Stages and `std` |
| |
| Note that there are two `std` libraries in play here: |
| |
| 1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage |
| N-1 `std`) |
| 2. The library _used to compile programs_ with `stageN/rustc`, which was built |
| by stage N (stage N `std`). |
| |
| Stage N `std` is pretty much necessary for any useful work with the stage N |
| compiler. Without it, you can only compile programs with `#![no_core]` -- not |
| terribly useful! |
| |
| The reason these need to be different is because they aren't necessarily |
| ABI-compatible: there could be new layout optimizations, changes to `MIR`, or |
| other changes to Rust metadata on `nightly` that aren't present in beta. |
| |
| This is also where `--keep-stage 1 library/std` comes into play. Since most |
| changes to the compiler don't actually change the ABI, once you've produced a |
| `std` in `stage1`, you can probably just reuse it with a different compiler. If |
| the ABI hasn't changed, you're good to go, no need to spend time recompiling |
| that `std`. The flag `--keep-stage` simply instructs the build script to assumes |
| the previous compile is fine and copies those artifacts into the appropriate |
| place, skipping the `cargo` invocation. |
| |
| ### Cross-compiling rustc |
| |
| *Cross-compiling* is the process of compiling code that will run on another |
| architecture. For instance, you might want to build an ARM version of rustc |
| using an x86 machine. Building `stage2` `std` is different when you are |
| cross-compiling. |
| |
| This is because `./x` uses the following logic: if `HOST` and `TARGET` are the |
| same, it will reuse `stage1` `std` for `stage2`! This is sound because `stage1` |
| `std` was compiled with the `stage1` compiler, i.e. a compiler using the source |
| code you currently have checked out. So it should be identical (and therefore |
| ABI-compatible) to the `std` that `stage2/rustc` would compile. |
| |
| However, when cross-compiling, `stage1` `std` will only run on the host. So the |
| `stage2` compiler has to recompile `std` for the target. |
| |
| (See in the table how `stage2` only builds non-host `std` targets). |
| |
| ### Why does only libstd use `cfg(bootstrap)`? |
| |
| For docs on `cfg(bootstrap)` itself, see [Complications of |
| Bootstrapping](#complications-of-bootstrapping). |
| |
| The `rustc` generated by the `stage0` compiler is linked to the freshly-built |
| `std`, which means that for the most part only `std` needs to be `cfg`-gated, so |
| that `rustc` can use features added to `std` immediately after their addition, |
| without need for them to get into the downloaded `beta` compiler. |
| |
| Note this is different from any other Rust program: `stage1` `rustc` is built by |
| the _beta_ compiler, but using the _master_ version of `libstd`! |
| |
| The only time `rustc` uses `cfg(bootstrap)` is when it adds internal lints that |
| use diagnostic items, or when it uses unstable library features that were |
| recently changed. |
| |
| ### What is a 'sysroot'? |
| |
| When you build a project with `cargo`, the build artifacts for dependencies are |
| normally stored in `target/debug/deps`. This only contains dependencies `cargo` |
| knows about; in particular, it doesn't have the standard library. Where do `std` |
| or `proc_macro` come from? They come from the **sysroot**, the root of a number |
| of directories where the compiler loads build artifacts at runtime. The |
| `sysroot` doesn't just store the standard library, though - it includes anything |
| that needs to be loaded at runtime. That includes (but is not limited to): |
| |
| - Libraries `libstd`/`libtest`/`libproc_macro`. |
| - Compiler crates themselves, when using `rustc_private`. In-tree these are |
| always present; out of tree, you need to install `rustc-dev` with `rustup`. |
| - Shared object file `libLLVM.so` for the LLVM project. In-tree this is either |
| built from source or downloaded from CI; out-of-tree, you need to install |
| `llvm-tools-preview` with `rustup`. |
| |
| All the artifacts listed so far are *compiler* runtime dependencies. You can see |
| them with `rustc --print sysroot`: |
| |
| ``` |
| $ ls $(rustc --print sysroot)/lib |
| libchalk_derive-0685d79833dc9b2b.so libstd-25c6acf8063a3802.so |
| libLLVM-11-rust-1.50.0-nightly.so libtest-57470d2aa8f7aa83.so |
| librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so |
| librustc_macros-5f0ec4a119c6ac86.so rustlib |
| ``` |
| |
| There are also runtime dependencies for the standard library! These are in |
| `lib/rustlib/`, not `lib/` directly. |
| |
| ``` |
| $ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5 |
| libaddr2line-6c8e02b8fedc1e5f.rlib |
| libadler-9ef2480568df55af.rlib |
| liballoc-9c4002b5f79ba0e1.rlib |
| libcfg_if-512eb53291f6de7e.rlib |
| libcompiler_builtins-ef2408da76957905.rlib |
| ``` |
| |
| Directory `lib/rustlib/` includes libraries like `hashbrown` and `cfg_if`, which |
| are not part of the public API of the standard library, but are used to |
| implement it. Also `lib/rustlib/` is part of the search path for linkers, but |
| `lib` will never be part of the search path. |
| |
| #### `-Z force-unstable-if-unmarked` |
| |
| Since `lib/rustlib/` is part of the search path we have to be careful about |
| which crates are included in it. In particular, all crates except for the |
| standard library are built with the flag `-Z force-unstable-if-unmarked`, which |
| means that you have to use `#![feature(rustc_private)]` in order to load it (as |
| opposed to the standard library, which is always available). |
| |
| The `-Z force-unstable-if-unmarked` flag has a variety of purposes to help |
| enforce that the correct crates are marked as `unstable`. It was introduced |
| primarily to allow rustc and the standard library to link to arbitrary crates on |
| crates.io which do not themselves use `staged_api`. `rustc` also relies on this |
| flag to mark all of its crates as `unstable` with the `rustc_private` feature so |
| that each crate does not need to be carefully marked with `unstable`. |
| |
| This flag is automatically applied to all of `rustc` and the standard library by |
| the bootstrap scripts. This is needed because the compiler and all of its |
| dependencies are shipped in `sysroot` to all users. |
| |
| This flag has the following effects: |
| |
| - Marks the crate as "`unstable`" with the `rustc_private` feature if it is not |
| itself marked as `stable` or `unstable`. |
| - Allows these crates to access other forced-unstable crates without any need |
| for attributes. Normally a crate would need a `#![feature(rustc_private)]` |
| attribute to use other `unstable` crates. However, that would make it |
| impossible for a crate from crates.io to access its own dependencies since |
| that crate won't have a `feature(rustc_private)` attribute, but *everything* |
| is compiled with `-Z force-unstable-if-unmarked`. |
| |
| Code which does not use `-Z force-unstable-if-unmarked` should include the |
| `#![feature(rustc_private)]` crate attribute to access these forced-unstable |
| crates. This is needed for things which link `rustc` its self, such as `MIRI` or |
| `clippy`. |
| |
| You can find more discussion about sysroots in: |
| - The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded |
| from `sysroot` |
| - [Discussions about sysroot on |
| Zulip](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/deps.20in.20sysroot/) |
| - [Discussions about building rustdoc out of |
| tree](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/How.20to.20create.20an.20executable.20accessing.20.60rustc_private.60.3F) |
| |
| [rustdoc PR]: https://github.com/rust-lang/rust/pull/76728 |
| |
| ## Passing flags to commands invoked by `bootstrap` |
| |
| Conveniently `./x` allows you to pass stage-specific flags to `rustc` and |
| `cargo` when bootstrapping. The `RUSTFLAGS_BOOTSTRAP` environment variable is |
| passed as `RUSTFLAGS` to the bootstrap stage (`stage0`), and |
| `RUSTFLAGS_NOT_BOOTSTRAP` is passed when building artifacts for later stages. |
| `RUSTFLAGS` will work, but also affects the build of `bootstrap` itself, so it |
| will be rare to want to use it. Finally, `MAGIC_EXTRA_RUSTFLAGS` bypasses the |
| `cargo` cache to pass flags to rustc without recompiling all dependencies. |
| |
| - `RUSTDOCFLAGS`, `RUSTDOCFLAGS_BOOTSTRAP` and `RUSTDOCFLAGS_NOT_BOOTSTRAP` are |
| analogous to `RUSTFLAGS`, but for `rustdoc`. |
| - `CARGOFLAGS` will pass arguments to cargo itself (e.g. `--timings`). |
| `CARGOFLAGS_BOOTSTRAP` and `CARGOFLAGS_NOT_BOOTSTRAP` work analogously to |
| `RUSTFLAGS_BOOTSTRAP`. |
| - `--test-args` will pass arguments through to the test runner. For `tests/ui`, |
| this is `compiletest`. For unit tests and doc tests this is the `libtest` |
| runner. |
| |
| Most test runner accept `--help`, which you can use to find out the options |
| accepted by the runner. |
| |
| ## Environment Variables |
| |
| During bootstrapping, there are a bunch of compiler-internal environment |
| variables that are used. If you are trying to run an intermediate version of |
| `rustc`, sometimes you may need to set some of these environment variables |
| manually. Otherwise, you get an error like the following: |
| |
| ```text |
| thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5 |
| ``` |
| |
| If `./stageN/bin/rustc` gives an error about environment variables, that usually |
| means something is quite wrong -- such as you're trying to compile `rustc` or |
| `std` or something which depends on environment variables. In the unlikely case |
| that you actually need to invoke `rustc` in such a situation, you can tell the |
| bootstrap shim to print all `env` variables by adding `-vvv` to your `x` |
| command. |
| |
| Finally, bootstrap makes use of the [cc-rs crate] which has [its own |
| method][env-vars] of configuring `C` compilers and `C` flags via environment |
| variables. |
| |
| [cc-rs crate]: https://github.com/rust-lang/cc-rs |
| [env-vars]: https://docs.rs/cc/latest/cc/#external-configuration-via-environment-variables |
| |
| ## Clarification of build command's `stdout` |
| |
| In this part, we will investigate the build command's `stdout` in an action |
| (similar, but more detailed and complete documentation compare to topic above). |
| When you execute `x build --dry-run` command, the build output will be something |
| like the following: |
| |
| ```text |
| Building stage0 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu) |
| Copying stage0 library from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu) |
| Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu) |
| Copying stage0 rustc from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu) |
| Assembling stage1 compiler (x86_64-unknown-linux-gnu) |
| Building stage1 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu) |
| Copying stage1 library from stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu) |
| Building stage1 tool rust-analyzer-proc-macro-srv (x86_64-unknown-linux-gnu) |
| Building rustdoc for stage1 (x86_64-unknown-linux-gnu) |
| ``` |
| |
| ### Building stage0 {std,compiler} artifacts |
| |
| These steps use the provided (downloaded, usually) compiler to compile the local |
| Rust source into libraries we can use. |
| |
| ### Copying stage0 \{std,rustc\} |
| |
| This copies the library and compiler artifacts from `cargo` into |
| `stage0-sysroot/lib/rustlib/{target-triple}/lib` |
| |
| ### Assembling stage1 compiler |
| |
| This copies the libraries we built in "building `stage0` ... artifacts" into the |
| `stage1` compiler's `lib/` directory. These are the host libraries that the |
| compiler itself uses to run. These aren't actually used by artifacts the new |
| compiler generates. This step also copies the `rustc` and `rustdoc` binaries we |
| generated into `build/$HOST/stage/bin`. |
| |
| The `stage1/bin/rustc` is a fully functional compiler, but it doesn't yet have |
| any libraries to link built binaries or libraries to. The next 3 steps will |
| provide those libraries for it; they are mostly equivalent to constructing the |
| `stage1/bin` compiler so we don't go through them individually here. |