| # Profile-guided optimization |
| |
| `rustc` supports doing profile-guided optimization (PGO). |
| This chapter describes what PGO is and how the support for it is |
| implemented in `rustc`. |
| |
| ## What is profiled-guided optimization? |
| |
| The basic concept of PGO is to collect data about the typical execution of |
| a program (e.g. which branches it is likely to take) and then use this data |
| to inform optimizations such as inlining, machine-code layout, |
| register allocation, etc. |
| |
| There are different ways of collecting data about a program's execution. |
| One is to run the program inside a profiler (such as `perf`) and another |
| is to create an instrumented binary, that is, a binary that has data |
| collection built into it, and run that. |
| The latter usually provides more accurate data. |
| |
| ## How is PGO implemented in `rustc`? |
| |
| `rustc` current PGO implementation relies entirely on LLVM. |
| LLVM actually [supports multiple forms][clang-pgo] of PGO: |
| |
| [clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |
| |
| - Sampling-based PGO where an external profiling tool like `perf` is used |
| to collect data about a program's execution. |
| - GCOV-based profiling, where code coverage infrastructure is used to collect |
| profiling information. |
| - Front-end based instrumentation, where the compiler front-end (e.g. Clang) |
| inserts instrumentation intrinsics into the LLVM IR it generates (but see the |
| [^note-instrument-coverage]"Note"). |
| - IR-level instrumentation, where LLVM inserts the instrumentation intrinsics |
| itself during optimization passes. |
| |
| `rustc` supports only the last approach, IR-level instrumentation, mainly |
| because it is almost exclusively implemented in LLVM and needs little |
| maintenance on the Rust side. Fortunately, it is also the most modern approach, |
| yielding the best results. |
| |
| So, we are dealing with an instrumentation-based approach, i.e. profiling data |
| is generated by a specially instrumented version of the program that's being |
| optimized. Instrumentation-based PGO has two components: a compile-time |
| component and run-time component, and one needs to understand the overall |
| workflow to see how they interact. |
| |
| [^note-instrument-coverage]: Note: `rustc` now supports front-end-based coverage |
| instrumentation, via the experimental option |
| [`-C instrument-coverage`](./llvm-coverage-instrumentation.md), but using these |
| coverage results for PGO has not been attempted at this time. |
| |
| ### Overall workflow |
| |
| Generating a PGO-optimized program involves the following four steps: |
| |
| 1. Compile the program with instrumentation enabled (e.g. `rustc -C profile-generate main.rs`) |
| 2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file |
| 3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool. |
| 4. Compile the program again, this time making use of the profiling data |
| (e.g. `rustc -C profile-use=merged.profdata main.rs`) |
| |
| ### Compile-time aspects |
| |
| Depending on which step in the above workflow we are in, two different things |
| can happen at compile time: |
| |
| #### Create binaries with instrumentation |
| |
| As mentioned above, the profiling instrumentation is added by LLVM. |
| `rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager] |
| flags when creating LLVM `PassManager`s: |
| |
| ```C |
| // `PMBR` is an `LLVMPassManagerBuilderRef` |
| unwrap(PMBR)->EnablePGOInstrGen = true; |
| // Instrumented binaries have a default output path for the `.profraw` file |
| // hard-coded into them: |
| unwrap(PMBR)->PGOInstrGen = PGOGenPath; |
| ``` |
| |
| `rustc` also has to make sure that some of the symbols from LLVM's profiling |
| runtime are not removed [by marking the with the right export level][pgo-gen-symbols]. |
| |
| [pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416 |
| [pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225 |
| |
| |
| #### Compile binaries where optimizations make use of profiling data |
| |
| In the final step of the workflow described above, the program is compiled |
| again, with the compiler using the gathered profiling data in order to drive |
| optimization decisions. `rustc` again leaves most of the work to LLVM here, |
| basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder` |
| where the profiling data can be found: |
| |
| ```C |
| unwrap(PMBR)->PGOInstrUse = PGOUsePath; |
| ``` |
| |
| [pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420 |
| |
| LLVM does the rest (e.g. setting branch weights, marking functions with |
| `cold` or `inlinehint`, etc). |
| |
| |
| ### Runtime aspects |
| |
| Instrumentation-based approaches always also have a runtime component, i.e. |
| once we have an instrumented program, that program needs to be run in order |
| to generate profiling data, and collecting and persisting this profiling |
| data needs some infrastructure in place. |
| |
| In the case of LLVM, these runtime components are implemented in |
| [compiler-rt][compiler-rt-profile] and statically linked into any instrumented |
| binaries. |
| The `rustc` version of this can be found in `library/profiler_builtins` which |
| basically packs the C code from `compiler-rt` into a Rust crate. |
| |
| In order for `profiler_builtins` to be built, `profiler = true` must be set |
| in `rustc`'s `bootstrap.toml`. |
| |
| [compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/main/compiler-rt/lib/profile |
| |
| ## Testing PGO |
| |
| Since the PGO workflow spans multiple compiler invocations most testing happens |
| in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name). |
| There is also a [codegen test][codegen-test] that checks that some expected |
| instrumentation artifacts show up in LLVM IR. |
| |
| [rmake-tests]: https://github.com/rust-lang/rust/tree/master/tests/run-make |
| [codegen-test]: https://github.com/rust-lang/rust/blob/master/tests/codegen-llvm/pgo-instrumentation.rs |
| |
| ## Additional information |
| |
| Clang's documentation contains a good overview on [PGO in LLVM][llvm-pgo]. |
| |
| [llvm-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |