| # Profiling the compiler |
| |
| This section talks about how to profile the compiler and find out where it spends its time. |
| |
| Depending on what you're trying to measure, there are several different approaches: |
| |
| - If you want to see if a PR improves or regresses compiler performance, |
| see the [rustc-perf chapter](tests/perf.md) for requesting a benchmarking run. |
| |
| - If you want a medium-to-high level overview of where `rustc` is spending its time: |
| - The `-Z self-profile` flag and [measureme](https://github.com/rust-lang/measureme) tools offer a query-based approach to profiling. |
| See [their docs](https://github.com/rust-lang/measureme/blob/master/summarize/README.md) for more information. |
| |
| - If you want function level performance data or even just more details than the above approaches: |
| - Consider using a native code profiler such as [perf](profiling/with_perf.md) |
| - or [tracy](https://github.com/nagisa/rust_tracy_client) for a nanosecond-precision, |
| full-featured graphical interface. |
| |
| - If you want a nice visual representation of the compile times of your crate graph, |
| you can use [cargo's `--timings` flag](https://doc.rust-lang.org/nightly/cargo/reference/timings.html), |
| e.g. `cargo build --timings`. |
| You can use this flag on the compiler itself with `CARGOFLAGS="--timings" ./x build` |
| |
| - If you want to profile memory usage, you can use various tools depending on what operating system |
| you are using. |
| - For Windows, read our [WPA guide](profiling/wpa_profiling.md). |
| |
| ## Optimizing rustc's bootstrap times with `cargo-llvm-lines` |
| |
| Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the |
| number of lines of LLVM IR across all instantiations of a generic function. |
| Since most of the time compiling rustc is spent in LLVM, the idea is that by |
| reducing the amount of code passed to LLVM, compiling rustc gets faster. |
| |
| To use `cargo-llvm-lines` together with somewhat custom rustc build process, you can use |
| `-C save-temps` to obtain required LLVM IR. The option preserves temporary work products |
| created during compilation. Among those is LLVM IR that represents an input to the |
| optimization pipeline; ideal for our purposes. It is stored in files with `*.no-opt.bc` |
| extension in LLVM bitcode format. |
| |
| Example usage: |
| ``` |
| cargo install cargo-llvm-lines |
| # On a normal crate you could now run `cargo llvm-lines`, but `x` isn't normal :P |
| |
| # Do a clean before every run, to not mix in the results from previous runs. |
| ./x clean |
| env RUSTFLAGS=-Csave-temps ./x build --stage 0 compiler/rustc |
| |
| # Single crate, e.g., rustc_middle. (Relies on the glob support of your shell.) |
| # Convert unoptimized LLVM bitcode into a human readable LLVM assembly accepted by cargo-llvm-lines. |
| for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.no-opt.bc; do |
| ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f" |
| done |
| cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/rustc_middle-*.ll > llvm-lines-middle.txt |
| |
| # Specify all crates of the compiler. |
| for f in build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.no-opt.bc; do |
| ./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-dis "$f" |
| done |
| cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/release/deps/*.ll > llvm-lines.txt |
| ``` |
| |
| Example output for the compiler: |
| ``` |
| Lines Copies Function name |
| ----- ------ ------------- |
| 45207720 (100%) 1583774 (100%) (TOTAL) |
| 2102350 (4.7%) 146650 (9.3%) core::ptr::drop_in_place |
| 615080 (1.4%) 8392 (0.5%) std::thread::local::LocalKey<T>::try_with |
| 594296 (1.3%) 1780 (0.1%) hashbrown::raw::RawTable<T>::rehash_in_place |
| 592071 (1.3%) 9691 (0.6%) core::option::Option<T>::map |
| 528172 (1.2%) 5741 (0.4%) core::alloc::layout::Layout::array |
| 466854 (1.0%) 8863 (0.6%) core::ptr::swap_nonoverlapping_one |
| 412736 (0.9%) 1780 (0.1%) hashbrown::raw::RawTable<T>::resize |
| 367776 (0.8%) 2554 (0.2%) alloc::raw_vec::RawVec<T,A>::grow_amortized |
| 367507 (0.8%) 643 (0.0%) rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl |
| 355882 (0.8%) 6332 (0.4%) alloc::alloc::box_free |
| 354556 (0.8%) 14213 (0.9%) core::ptr::write |
| 354361 (0.8%) 3590 (0.2%) core::iter::traits::iterator::Iterator::fold |
| 347761 (0.8%) 3873 (0.2%) rustc_middle::ty::context::tls::set_tlv |
| 337534 (0.7%) 2377 (0.2%) alloc::raw_vec::RawVec<T,A>::allocate_in |
| 331690 (0.7%) 3192 (0.2%) hashbrown::raw::RawTable<T>::find |
| 328756 (0.7%) 3978 (0.3%) rustc_middle::ty::context::tls::with_context_opt |
| 326903 (0.7%) 642 (0.0%) rustc_query_system::query::plumbing::try_execute_query |
| ``` |
| |
| Since this doesn't seem to work with incremental compilation or `./x check`, |
| you will be compiling rustc _a lot_. |
| I recommend changing a few settings in `bootstrap.toml` to make it bearable: |
| ``` |
| [rust] |
| # A debug build takes _a third_ as long on my machine, |
| # but compiling more than stage0 rustc becomes unbearably slow. |
| optimize = false |
| |
| # We can't use incremental anyway, so we disable it for a little speed boost. |
| incremental = false |
| # We won't be running it, so no point in compiling debug checks. |
| debug = false |
| |
| # Using a single codegen unit gives less output, but is slower to compile. |
| codegen-units = 0 # num_cpus |
| ``` |
| |
| The llvm-lines output is affected by several options. |
| `optimize = false` increases it from 2.1GB to 3.5GB and `codegen-units = 0` to 4.1GB. |
| |
| MIR optimizations have little impact. Compared to the default `RUSTFLAGS="-Z |
| mir-opt-level=1"`, level 0 adds 0.3GB and level 2 removes 0.2GB. |
| As of <!-- date-check --> July 2022, |
| inlining happens in LLVM and GCC codegen backends, |
| missing only in the Cranelift one. |