| # Profile-guided optimization | 
 |  | 
 | <!-- toc --> | 
 |  | 
 | `rustc` supports doing profile-guided optimization (PGO). | 
 | This chapter describes what PGO is and how the support for it is | 
 | implemented in `rustc`. | 
 |  | 
 | ## What is profiled-guided optimization? | 
 |  | 
 | The basic concept of PGO is to collect data about the typical execution of | 
 | a program (e.g. which branches it is likely to take) and then use this data | 
 | to inform optimizations such as inlining, machine-code layout, | 
 | register allocation, etc. | 
 |  | 
 | There are different ways of collecting data about a program's execution. | 
 | One is to run the program inside a profiler (such as `perf`) and another | 
 | is to create an instrumented binary, that is, a binary that has data | 
 | collection built into it, and run that. | 
 | The latter usually provides more accurate data. | 
 |  | 
 | ## How is PGO implemented in `rustc`? | 
 |  | 
 | `rustc` current PGO implementation relies entirely on LLVM. | 
 | LLVM actually [supports multiple forms][clang-pgo] of PGO: | 
 |  | 
 | [clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization | 
 |  | 
 | - Sampling-based PGO where an external profiling tool like `perf` is used | 
 |   to collect data about a program's execution. | 
 | - GCOV-based profiling, where code coverage infrastructure is used to collect | 
 |   profiling information. | 
 | - Front-end based instrumentation, where the compiler front-end (e.g. Clang) | 
 |   inserts instrumentation intrinsics into the LLVM IR it generates (but see the | 
 |   [^note-instrument-coverage]"Note"). | 
 | - IR-level instrumentation, where LLVM inserts the instrumentation intrinsics | 
 |   itself during optimization passes. | 
 |  | 
 | `rustc` supports only the last approach, IR-level instrumentation, mainly | 
 | because it is almost exclusively implemented in LLVM and needs little | 
 | maintenance on the Rust side. Fortunately, it is also the most modern approach, | 
 | yielding the best results. | 
 |  | 
 | So, we are dealing with an instrumentation-based approach, i.e. profiling data | 
 | is generated by a specially instrumented version of the program that's being | 
 | optimized. Instrumentation-based PGO has two components: a compile-time | 
 | component and run-time component, and one needs to understand the overall | 
 | workflow to see how they interact. | 
 |  | 
 | [^note-instrument-coverage]: Note: `rustc` now supports front-end-based coverage | 
 | instrumentation, via the experimental option | 
 | [`-C instrument-coverage`](./llvm-coverage-instrumentation.md), but using these | 
 | coverage results for PGO has not been attempted at this time. | 
 |  | 
 | ### Overall workflow | 
 |  | 
 | Generating a PGO-optimized program involves the following four steps: | 
 |  | 
 | 1. Compile the program with instrumentation enabled (e.g. `rustc -C profile-generate main.rs`) | 
 | 2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file | 
 | 3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool. | 
 | 4. Compile the program again, this time making use of the profiling data | 
 |    (e.g. `rustc -C profile-use=merged.profdata main.rs`) | 
 |  | 
 | ### Compile-time aspects | 
 |  | 
 | Depending on which step in the above workflow we are in, two different things | 
 | can happen at compile time: | 
 |  | 
 | #### Create binaries with instrumentation | 
 |  | 
 | As mentioned above, the profiling instrumentation is added by LLVM. | 
 | `rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager] | 
 | flags when creating LLVM `PassManager`s: | 
 |  | 
 | ```C | 
 | 	// `PMBR` is an `LLVMPassManagerBuilderRef` | 
 |     unwrap(PMBR)->EnablePGOInstrGen = true; | 
 |     // Instrumented binaries have a default output path for the `.profraw` file | 
 |     // hard-coded into them: | 
 |     unwrap(PMBR)->PGOInstrGen = PGOGenPath; | 
 | ``` | 
 |  | 
 | `rustc` also has to make sure that some of the symbols from LLVM's profiling | 
 | runtime are not removed [by marking the with the right export level][pgo-gen-symbols]. | 
 |  | 
 | [pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416 | 
 | [pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225 | 
 |  | 
 |  | 
 | #### Compile binaries where optimizations make use of profiling data | 
 |  | 
 | In the final step of the workflow described above, the program is compiled | 
 | again, with the compiler using the gathered profiling data in order to drive | 
 | optimization decisions. `rustc` again leaves most of the work to LLVM here, | 
 | basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder` | 
 | where the profiling data can be found: | 
 |  | 
 | ```C | 
 | 	unwrap(PMBR)->PGOInstrUse = PGOUsePath; | 
 | ``` | 
 |  | 
 | [pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420 | 
 |  | 
 | LLVM does the rest (e.g. setting branch weights, marking functions with | 
 | `cold` or `inlinehint`, etc). | 
 |  | 
 |  | 
 | ### Runtime aspects | 
 |  | 
 | Instrumentation-based approaches always also have a runtime component, i.e. | 
 | once we have an instrumented program, that program needs to be run in order | 
 | to generate profiling data, and collecting and persisting this profiling | 
 | data needs some infrastructure in place. | 
 |  | 
 | In the case of LLVM, these runtime components are implemented in | 
 | [compiler-rt][compiler-rt-profile] and statically linked into any instrumented | 
 | binaries. | 
 | The `rustc` version of this can be found in `library/profiler_builtins` which | 
 | basically packs the C code from `compiler-rt` into a Rust crate. | 
 |  | 
 | In order for `profiler_builtins` to be built, `profiler = true` must be set | 
 | in `rustc`'s `bootstrap.toml`. | 
 |  | 
 | [compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/main/compiler-rt/lib/profile | 
 |  | 
 | ## Testing PGO | 
 |  | 
 | Since the PGO workflow spans multiple compiler invocations most testing happens | 
 | in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name). | 
 | There is also a [codegen test][codegen-test] that checks that some expected | 
 | instrumentation artifacts show up in LLVM IR. | 
 |  | 
 | [rmake-tests]: https://github.com/rust-lang/rust/tree/master/tests/run-make | 
 | [codegen-test]: https://github.com/rust-lang/rust/blob/master/tests/codegen/pgo-instrumentation.rs | 
 |  | 
 | ## Additional information | 
 |  | 
 | Clang's documentation contains a good overview on [PGO in LLVM][llvm-pgo]. | 
 |  | 
 | [llvm-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |