| # Bootstrapping the Compiler |
| |
| <!-- toc --> |
| |
| |
| [*Bootstrapping*][boot] is the process of using a compiler to compile itself. |
| More accurately, it means using an older compiler to compile a newer version |
| of the same compiler. |
| |
| This raises a chicken-and-egg paradox: where did the first compiler come from? |
| It must have been written in a different language. In Rust's case it was |
| [written in OCaml][ocaml-compiler]. However it was abandoned long ago and the |
| only way to build a modern version of rustc is a slightly less modern |
| version. |
| |
| This is exactly how `x.py` works: it downloads the current beta release of |
| rustc, then uses it to compile the new compiler. |
| |
| ## Stages of bootstrapping |
| |
| Compiling `rustc` is done in stages. |
| |
| ### Stage 0 |
| |
| The stage0 compiler is usually the current _beta_ `rustc` compiler |
| and its associated dynamic libraries, |
| which `x.py` will download for you. |
| (You can also configure `x.py` to use something else.) |
| |
| The stage0 compiler is then used only to compile `rustbuild`, `std`, and `rustc`. |
| When compiling `rustc`, the stage0 compiler uses the freshly compiled `std`. |
| There are two concepts at play here: |
| a compiler (with its set of dependencies) |
| and its 'target' or 'object' libraries (`std` and `rustc`). |
| Both are staged, but in a staggered manner. |
| |
| ### Stage 1 |
| |
| The rustc source code is then compiled with the stage0 compiler to produce the stage1 compiler. |
| |
| ### Stage 2 |
| |
| We then rebuild our stage1 compiler with itself to produce the stage2 compiler. |
| |
| In theory, the stage1 compiler is functionally identical to the stage2 compiler, |
| but in practice there are subtle differences. |
| In particular, the stage1 compiler itself was built by stage0 |
| and hence not by the source in your working directory. |
| This means that the symbol names used in the compiler source |
| may not match the symbol names that would have been made by the stage1 compiler, |
| which can cause problems for dynamic libraries and tests. |
| |
| The `stage2` compiler is the one distributed with `rustup` and all other install methods. |
| However, it takes a very long time to build |
| because one must first build the new compiler with an older compiler |
| and then use that to build the new compiler with itself. |
| For development, you usually only want the `stage1` compiler, |
| which you can build with `x.py build library/std`. |
| See [Building the Compiler](/building/how-to-build-and-run.html#building-the-compiler). |
| |
| ### Stage 3 |
| |
| Stage 3 is optional. To sanity check our new compiler, we |
| can build the libraries with the stage2 compiler. The result ought |
| to be identical to before, unless something has broken. |
| |
| ### Building the stages |
| |
| `x.py` tries to be helpful and pick the stage you most likely meant for each subcommand. |
| These defaults are as follows: |
| |
| - `check`: `--stage 0` |
| - `doc`: `--stage 0` |
| - `build`: `--stage 1` |
| - `test`: `--stage 1` |
| - `dist`: `--stage 2` |
| - `install`: `--stage 2` |
| - `bench`: `--stage 2` |
| |
| You can always override the stage by passing `--stage N` explicitly. |
| |
| For more information about stages, [see below](#understanding-stages-of-bootstrap). |
| |
| ## Complications of bootstrapping |
| |
| Since the build system uses the current beta compiler to build the stage-1 |
| bootstrapping compiler, the compiler source code can't use some features |
| until they reach beta (because otherwise the beta compiler doesn't support |
| them). On the other hand, for [compiler intrinsics][intrinsics] and internal |
| features, the features _have_ to be used. Additionally, the compiler makes |
| heavy use of nightly features (`#![feature(...)]`). How can we resolve this |
| problem? |
| |
| There are two methods used: |
| 1. The build system sets `--cfg bootstrap` when building with `stage0`, so we |
| can use `cfg(not(bootstrap))` to only use features when built with `stage1`. |
| This is useful for e.g. features that were just stabilized, which require |
| `#![feature(...)]` when built with `stage0`, but not for `stage1`. |
| 2. The build system sets `RUSTC_BOOTSTRAP=1`. This special variable means to |
| _break the stability guarantees_ of rust: Allow using `#![feature(...)]` with |
| a compiler that's not nightly. This should never be used except when |
| bootstrapping the compiler. |
| |
| [boot]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers) |
| [intrinsics]: ../appendix/glossary.md#intrinsic |
| [ocaml-compiler]: https://github.com/rust-lang/rust/tree/ef75860a0a72f79f97216f8aaa5b388d98da6480/src/boot |
| |
| ## Contributing to bootstrap |
| |
| When you use the bootstrap system, you'll call it through `x.py`. |
| However, most of the code lives in `src/bootstrap`. |
| `bootstrap` has a difficult problem: it is written in Rust, but yet it is run |
| before the rust compiler is built! To work around this, there are two |
| components of bootstrap: the main one written in rust, and `bootstrap.py`. |
| `bootstrap.py` is what gets run by `x.py`. It takes care of downloading the |
| `stage0` compiler, which will then build the bootstrap binary written in |
| Rust. |
| |
| Because there are two separate codebases behind `x.py`, they need to |
| be kept in sync. In particular, both `bootstrap.py` and the bootstrap binary |
| parse `config.toml` and read the same command line arguments. `bootstrap.py` |
| keeps these in sync by setting various environment variables, and the |
| programs sometimes have to add arguments that are explicitly ignored, to be |
| read by the other. |
| |
| ### Adding a setting to config.toml |
| |
| This section is a work in progress. In the meantime, you can see an example |
| contribution [here][bootstrap-build]. |
| |
| [bootstrap-build]: https://github.com/rust-lang/rust/pull/71994 |
| |
| ## Understanding stages of bootstrap |
| |
| ### Overview |
| |
| This is a detailed look into the separate bootstrap stages. |
| |
| The convention `x.py` uses is that: |
| |
| - A `--stage N` flag means to run the stage N compiler (`stageN/rustc`). |
| - A "stage N artifact" is a build artifact that is _produced_ by the stage N compiler. |
| - The stage N+1 compiler is assembled from stage N *artifacts*. This |
| process is called _uplifting_. |
| |
| #### Build artifacts |
| |
| Anything you can build with `x.py` is a _build artifact_. |
| Build artifacts include, but are not limited to: |
| |
| - binaries, like `stage0-rustc/rustc-main` |
| - shared objects, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so` |
| - [rlib] files, like `stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib` |
| - HTML files generated by rustdoc, like `doc/std` |
| |
| [rlib]: ../serialization.md |
| |
| #### Examples |
| |
| - `x.py build --stage 0` means to build with the beta `rustc`. |
| - `x.py doc --stage 0` means to document using the beta `rustdoc`. |
| - `x.py test --stage 0 library/std` means to run tests on the standard library |
| without building `rustc` from source ('build with stage 0, then test the |
| artifacts'). If you're working on the standard library, this is normally the |
| test command you want. |
| - `x.py test src/test/ui` means to build the stage 1 compiler and run |
| `compiletest` on it. If you're working on the compiler, this is normally the |
| test command you want. |
| |
| #### Examples of what *not* to do |
| |
| - `x.py test --stage 0 src/test/ui` is not meaningful: it runs tests on the |
| _beta_ compiler and doesn't build `rustc` from source. Use `test src/test/ui` |
| instead, which builds stage 1 from source. |
| - `x.py test --stage 0 compiler/rustc` builds the compiler but runs no tests: |
| it's running `cargo test -p rustc`, but cargo doesn't understand Rust's |
| tests. You shouldn't need to use this, use `test` instead (without arguments). |
| - `x.py build --stage 0 compiler/rustc` builds the compiler, but does not build |
| libstd or even libcore. Most of the time, you'll want `x.py build |
| library/std` instead, which allows compiling programs without needing to define |
| lang items. |
| |
| ### Building vs. running |
| |
| Note that `build --stage N compiler/rustc` **does not** build the stage N compiler: |
| instead it builds the stage N+1 compiler _using_ the stage N compiler. |
| |
| In short, _stage 0 uses the stage0 compiler to create stage0 artifacts which |
| will later be uplifted to be the stage1 compiler_. |
| |
| In each stage, two major steps are performed: |
| |
| 1. `std` is compiled by the stage N compiler. |
| 2. That `std` is linked to programs built by the stage N compiler, |
| including the stage N artifacts (stage N+1 compiler). |
| |
| This is somewhat intuitive if one thinks of the stage N artifacts as "just" |
| another program we are building with the stage N compiler: |
| `build --stage N compiler/rustc` is linking the stage N artifacts to the `std` |
| built by the stage N compiler. |
| |
| Here is a chart of a full build using `x.py`: |
| |
| <img alt="A diagram of the rustc compilation phases" src="../img/rustc_stages.svg" class="center" /> |
| |
| Keep in mind this diagram is a simplification, i.e. `rustdoc` can be built at |
| different stages, the process is a bit different when passing flags such as |
| `--keep-stage`, or if there are non-host targets. |
| |
| ### Stages and `std` |
| |
| Note that there are two `std` libraries in play here: |
| 1. The library _linked_ to `stageN/rustc`, which was built by stage N-1 (stage N-1 `std`) |
| 2. The library _used to compile programs_ with `stageN/rustc`, which was |
| built by stage N (stage N `std`). |
| |
| Stage N `std` is pretty much necessary for any useful work with the stage N compiler. |
| Without it, you can only compile programs with `#![no_core]` -- not terribly useful! |
| |
| The reason these need to be different is because they aren't necessarily ABI-compatible: |
| there could be a new layout optimizations, changes to MIR, or other changes |
| to Rust metadata on nightly that aren't present in beta. |
| |
| This is also where `--keep-stage 1 library/std` comes into play. Since most |
| changes to the compiler don't actually change the ABI, once you've produced a |
| `std` in stage 1, you can probably just reuse it with a different compiler. |
| If the ABI hasn't changed, you're good to go, no need to spend time |
| recompiling that `std`. |
| `--keep-stage` simply assumes the previous compile is fine and copies those |
| artifacts into the appropriate place, skipping the cargo invocation. |
| |
| ### Cross-compiling rustc |
| |
| *Cross-compiling* is the process of compiling code that will run on another archicture. |
| For instance, you might want to build an ARM version of rustc using an x86 machine. |
| Building stage2 `std` is different when you are cross-compiling. |
| |
| This is because `x.py` uses a trick: if `HOST` and `TARGET` are the same, |
| it will reuse stage1 `std` for stage2! This is sound because stage1 `std` |
| was compiled with the stage1 compiler, i.e. a compiler using the source code |
| you currently have checked out. So it should be identical (and therefore ABI-compatible) |
| to the `std` that `stage2/rustc` would compile. |
| |
| However, when cross-compiling, stage1 `std` will only run on the host. |
| So the stage2 compiler has to recompile `std` for the target. |
| |
| (See in the table how stage2 only builds non-host `std` targets). |
| |
| ### Why does only libstd use `cfg(bootstrap)`? |
| |
| The `rustc` generated by the stage0 compiler is linked to the freshly-built |
| `std`, which means that for the most part only `std` needs to be cfg-gated, |
| so that `rustc` can use features added to std immediately after their addition, |
| without need for them to get into the downloaded beta. |
| |
| Note this is different from any other Rust program: stage1 `rustc` |
| is built by the _beta_ compiler, but using the _master_ version of libstd! |
| |
| The only time `rustc` uses `cfg(bootstrap)` is when it adds internal lints |
| that use diagnostic items. This happens very rarely. |
| |
| ### What is a 'sysroot'? |
| |
| When you build a project with cargo, the build artifacts for dependencies |
| are normally stored in `target/debug/deps`. This only contains dependencies cargo |
| knows about; in particular, it doesn't have the standard library. Where do |
| `std` or `proc_macro` come from? It comes from the **sysroot**, the root |
| of a number of directories where the compiler loads build artifacts at runtime. |
| The sysroot doesn't just store the standard library, though - it includes |
| anything that needs to be loaded at runtime. That includes (but is not limited |
| to): |
| |
| - `libstd`/`libtest`/`libproc_macro` |
| - The compiler crates themselves, when using `rustc_private`. In-tree these |
| are always present; out of tree, you need to install `rustc-dev` with rustup. |
| - `libLLVM.so`, the shared object file for the LLVM project. In-tree this is |
| either built from source or downloaded from CI; out-of-tree, you need to |
| install `llvm-tools-preview` with rustup. |
| |
| All the artifacts listed so far are *compiler* runtime dependencies. You can |
| see them with `rustc --print sysroot`: |
| |
| ``` |
| $ ls $(rustc --print sysroot)/lib |
| libchalk_derive-0685d79833dc9b2b.so libstd-25c6acf8063a3802.so |
| libLLVM-11-rust-1.50.0-nightly.so libtest-57470d2aa8f7aa83.so |
| librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so |
| librustc_macros-5f0ec4a119c6ac86.so rustlib |
| ``` |
| |
| There are also runtime dependencies for the standard library! These are in |
| `lib/rustlib`, not `lib/` directly. |
| |
| ``` |
| $ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5 |
| libaddr2line-6c8e02b8fedc1e5f.rlib |
| libadler-9ef2480568df55af.rlib |
| liballoc-9c4002b5f79ba0e1.rlib |
| libcfg_if-512eb53291f6de7e.rlib |
| libcompiler_builtins-ef2408da76957905.rlib |
| ``` |
| |
| `rustlib` includes libraries like `hashbrown` and `cfg_if`, which are not part |
| of the public API of the standard library, but are used to implement it. |
| `rustlib` is part of the search path for linkers, but `lib` will never be part |
| of the search path. |
| |
| #### -Z force-unstable-if-unmarked |
| |
| Since `rustlib` is part of the search path, it means we have to be careful |
| about which crates are included in it. In particular, all crates except for |
| the standard library are built with the flag `-Z force-unstable-if-unmarked`, |
| which means that you have to use `#![feature(rustc_private)]` in order to |
| load it (as opposed to the standard library, which is always available). |
| |
| The `-Z force-unstable-if-unmarked` flag has a variety of purposes to help |
| enforce that the correct crates are marked as unstable. It was introduced |
| primarily to allow rustc and the standard library to link to arbitrary crates |
| on crates.io which do not themselves use `staged_api`. `rustc` also relies on |
| this flag to mark all of its crates as unstable with the `rustc_private` |
| feature so that each crate does not need to be carefully marked with |
| `unstable`. |
| |
| This flag is automatically applied to all of `rustc` and the standard library |
| by the bootstrap scripts. This is needed because the compiler and all of its |
| dependencies are shipped in the sysroot to all users. |
| |
| This flag has the following effects: |
| |
| - Marks the crate as "unstable" with the `rustc_private` feature if it is not |
| itself marked as stable or unstable. |
| - Allows these crates to access other forced-unstable crates without any need |
| for attributes. Normally a crate would need a `#![feature(rustc_private)]` |
| attribute to use other unstable crates. However, that would make it |
| impossible for a crate from crates.io to access its own dependencies since |
| that crate won't have a `feature(rustc_private)` attribute, but *everything* |
| is compiled with `-Z force-unstable-if-unmarked`. |
| |
| Code which does not use `-Z force-unstable-if-unmarked` should include the |
| `#![feature(rustc_private)]` crate attribute to access these force-unstable |
| crates. This is needed for things that link `rustc`, such as `miri`, `rls`, or |
| `clippy`. |
| |
| You can find more discussion about sysroots in: |
| - The [rustdoc PR] explaining why it uses `extern crate` for dependencies loaded from sysroot |
| - [Discussions about sysroot on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/deps.20in.20sysroot/) |
| - [Discussions about building rustdoc out of tree](https://rust-lang.zulipchat.com/#narrow/stream/182449-t-compiler.2Fhelp/topic/How.20to.20create.20an.20executable.20accessing.20.60rustc_private.60.3F) |
| |
| [rustdoc PR]: https://github.com/rust-lang/rust/pull/76728 |
| |
| ### Directories and artifacts generated by x.py |
| |
| The following tables indicate the outputs of various stage actions: |
| |
| | Stage 0 Action | Output | |
| |-----------------------------------------------------------|----------------------------------------------| |
| | `beta` extracted | `build/HOST/stage0` | |
| | `stage0` builds `bootstrap` | `build/bootstrap` | |
| | `stage0` builds `test`/`std` | `build/HOST/stage0-std/TARGET` | |
| | copy `stage0-std` (HOST only) | `build/HOST/stage0-sysroot/lib/rustlib/HOST` | |
| | `stage0` builds `rustc` with `stage0-sysroot` | `build/HOST/stage0-rustc/HOST` | |
| | copy `stage0-rustc (except executable)` | `build/HOST/stage0-sysroot/lib/rustlib/HOST` | |
| | build `llvm` | `build/HOST/llvm` | |
| | `stage0` builds `codegen` with `stage0-sysroot` | `build/HOST/stage0-codegen/HOST` | |
| | `stage0` builds `rustdoc`, `clippy`, `miri`, with `stage0-sysroot` | `build/HOST/stage0-tools/HOST` | |
| |
| `--stage=0` stops here. |
| |
| | Stage 1 Action | Output | |
| |-----------------------------------------------------|---------------------------------------| |
| | copy (uplift) `stage0-rustc` executable to `stage1` | `build/HOST/stage1/bin` | |
| | copy (uplift) `stage0-codegen` to `stage1` | `build/HOST/stage1/lib` | |
| | copy (uplift) `stage0-sysroot` to `stage1` | `build/HOST/stage1/lib` | |
| | `stage1` builds `test`/`std` | `build/HOST/stage1-std/TARGET` | |
| | copy `stage1-std` (HOST only) | `build/HOST/stage1/lib/rustlib/HOST` | |
| | `stage1` builds `rustc` | `build/HOST/stage1-rustc/HOST` | |
| | copy `stage1-rustc` (except executable) | `build/HOST/stage1/lib/rustlib/HOST` | |
| | `stage1` builds `codegen` | `build/HOST/stage1-codegen/HOST` | |
| |
| `--stage=1` stops here. |
| |
| | Stage 2 Action | Output | |
| |--------------------------------------------------------|-----------------------------------------------------------------| |
| | copy (uplift) `stage1-rustc` executable | `build/HOST/stage2/bin` | |
| | copy (uplift) `stage1-sysroot` | `build/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST` | |
| | `stage2` builds `test`/`std` (not HOST targets) | `build/HOST/stage2-std/TARGET` | |
| | copy `stage2-std` (not HOST targets) | `build/HOST/stage2/lib/rustlib/TARGET` | |
| | `stage2` builds `rustdoc`, `clippy`, `miri` | `build/HOST/stage2-tools/HOST` | |
| | copy `rustdoc` | `build/HOST/stage2/bin` | |
| |
| `--stage=2` stops here. |
| |
| ## Passing stage-specific flags to `rustc` |
| |
| `x.py` allows you to pass stage-specific flags to `rustc` when bootstrapping. |
| The `RUSTFLAGS_BOOTSTRAP` environment variable is passed as RUSTFLAGS to the bootstrap stage |
| (stage0), and `RUSTFLAGS_NOT_BOOTSTRAP` is passed when building artifacts for later stages. |
| |
| ## Environment Variables |
| |
| During bootstrapping, there are a bunch of compiler-internal environment |
| variables that are used. If you are trying to run an intermediate version of |
| `rustc`, sometimes you may need to set some of these environment variables |
| manually. Otherwise, you get an error like the following: |
| |
| ```text |
| thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5 |
| ``` |
| |
| If `./stageN/bin/rustc` gives an error about environment variables, that |
| usually means something is quite wrong -- or you're trying to compile e.g. |
| `rustc` or `std` or something that depends on environment variables. In |
| the unlikely case that you actually need to invoke rustc in such a situation, |
| you can find the environment variable values by adding the following flag to |
| your `x.py` command: `--on-fail=print-env`. |