|  | # Parallel compilation | 
|  |  | 
|  | <div class="warning"> | 
|  | As of <!-- date-check --> November 2024, | 
|  | the parallel front-end is undergoing significant changes, | 
|  | so this page contains quite a bit of outdated information. | 
|  |  | 
|  | Tracking issue: <https://github.com/rust-lang/rust/issues/113349> | 
|  | </div> | 
|  |  | 
|  | As of <!-- date-check --> November 2024, most of the rust compiler is now | 
|  | parallelized. | 
|  |  | 
|  | - The codegen part is executed concurrently by default. You can use the `-C | 
|  | codegen-units=n` option to control the number of concurrent tasks. | 
|  | - The parts after HIR lowering to codegen such as type checking, borrowing | 
|  | checking, and mir optimization are parallelized in the nightly version. | 
|  | Currently, they are executed in serial by default, and parallelization is | 
|  | manually enabled by the user using the `-Z threads = n` option. | 
|  | - Other parts, such as lexical parsing, HIR lowering, and macro expansion, are | 
|  | still executed in serial mode. | 
|  |  | 
|  | <div class="warning"> | 
|  | The following sections are kept for now but are quite outdated. | 
|  | </div> | 
|  |  | 
|  | --- | 
|  |  | 
|  | [codegen]: backend/codegen.md | 
|  |  | 
|  | ## Code generation | 
|  |  | 
|  | During monomorphization the compiler splits up all the code to | 
|  | be generated into smaller chunks called _codegen units_. These are then generated by | 
|  | independent instances of LLVM running in parallel. At the end, the linker | 
|  | is run to combine all the codegen units together into one binary. This process | 
|  | occurs in the [`rustc_codegen_ssa::base`] module. | 
|  |  | 
|  | [`rustc_codegen_ssa::base`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/index.html | 
|  |  | 
|  | ## Data structures | 
|  |  | 
|  | The underlying thread-safe data-structures used in the parallel compiler | 
|  | can be found in the [`rustc_data_structures::sync`] module. These data structures | 
|  | are implemented differently depending on whether `parallel-compiler` is true. | 
|  |  | 
|  | | data structure                   | parallel                                            | non-parallel | | 
|  | | -------------------------------- | --------------------------------------------------- | ------------ | | 
|  | | Lock\<T> | (parking_lot::Mutex\<T>) | (std::cell::RefCell) | | 
|  | | RwLock\<T> | (parking_lot::RwLock\<T>) | (std::cell::RefCell) | | 
|  | | MTLock\<T> | (Lock\<T>) | (T) | | 
|  | | ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref | | 
|  | | MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref | | 
|  | | WriteGuard | parking_lot::RwLockWriteGuard | std::cell::RefMut | | 
|  | | MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut | | 
|  | | LockGuard | parking_lot::MutexGuard | std::cell::RefMut | | 
|  |  | 
|  | - These thread-safe data structures are interspersed during compilation which | 
|  | can cause lock contention resulting in degraded performance as the number of | 
|  | threads increases beyond 4. So we audit the use of these data structures | 
|  | which leads to either a refactoring so as to reduce the use of shared state, | 
|  | or the authoring of persistent documentation covering the specific of the | 
|  | invariants, the atomicity, and the lock orderings. | 
|  |  | 
|  | - On the other hand, we still need to figure out what other invariants | 
|  | during compilation might not hold in parallel compilation. | 
|  |  | 
|  | [`rustc_data_structures::sync`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/index.html | 
|  |  | 
|  | ### WorkerLocal | 
|  |  | 
|  | [`WorkerLocal`] is a special data structure implemented for parallel compilers. It | 
|  | holds worker-locals values for each thread in a thread pool. You can only | 
|  | access the worker local value through the `Deref` `impl` on the thread pool it | 
|  | was constructed on. It panics otherwise. | 
|  |  | 
|  | `WorkerLocal` is used to implement the `Arena` allocator in the parallel | 
|  | environment, which is critical in parallel queries. Its implementation is | 
|  | located in the [`rustc_data_structures::sync::worker_local`] module. However, | 
|  | in the non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T` | 
|  | can be accessed directly through `Deref::deref`. | 
|  |  | 
|  | [`rustc_data_structures::sync::worker_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/index.html | 
|  | [`WorkerLocal`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/struct.WorkerLocal.html | 
|  |  | 
|  | ## Parallel iterator | 
|  |  | 
|  | The parallel iterators provided by the [`rayon`] crate are easy ways to | 
|  | implement parallelism. In the current implementation of the parallel compiler | 
|  | we use a custom [fork][rustc-rayon] of `rayon` to run tasks in parallel. | 
|  |  | 
|  | Some iterator functions are implemented to run loops in parallel | 
|  | when `parallel-compiler` is true. | 
|  |  | 
|  | | Function(Omit `Send` and `Sync`)                             | Introduction                                                 | Owning Module              | | 
|  | | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- | | 
|  | | **par_iter**<T: IntoParallelIterator>(t: T) -> T::Iter       | generate a parallel iterator                                 | rustc_data_structure::sync | | 
|  | | **par_for_each_in**<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run `for_each` on each element | rustc_data_structure::sync | | 
|  | | **Map::par_body_owners**(self, f: impl Fn(LocalDefId))       | run `f` on all hir owners in the crate                       | rustc_middle::hir::map     | | 
|  | | **Map::par_for_each_module**(self, f: impl Fn(LocalDefId))   | run `f` on all modules and sub modules in the crate          | rustc_middle::hir::map     | | 
|  | | **ModuleItems::par_items**(&self, f: impl Fn(ItemId))        | run `f` on all items in the module                           | rustc_middle::hir          | | 
|  | | **ModuleItems::par_trait_items**(&self, f: impl Fn(TraitItemId)) | run `f` on all trait items in the module                     | rustc_middle::hir          | | 
|  | | **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module                      | rustc_middle::hir          | | 
|  | | **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module                   | rustc_middle::hir          | | 
|  |  | 
|  | There are a lot of loops in the compiler which can possibly be parallelized | 
|  | using these functions. As of <!-- date-check--> August 2022, scenarios where | 
|  | the parallel iterator function has been used are as follows: | 
|  |  | 
|  | | caller                                                  | scenario                                                     | callee                   | | 
|  | | ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ | | 
|  | | rustc_metadata::rmeta::encoder::prefetch_mir            | Prefetch queries which will be needed later by metadata encoding | par_iter                 | | 
|  | | rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in          | | 
|  | | rustc_interface::passes::analysis                       | Check the validity of the match statements                   | Map::par_body_owners     | | 
|  | | rustc_interface::passes::analysis                       | MIR borrow check                                             | Map::par_body_owners     | | 
|  | | rustc_typeck::check::typeck_item_bodies                 | Type check                                                   | Map::par_body_owners     | | 
|  | | rustc_interface::passes::hir_id_validator::check_crate  | Check the validity of hir                                    | Map::par_for_each_module | | 
|  | | rustc_interface::passes::analysis                       | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module | | 
|  | | rustc_interface::passes::analysis                       | Liveness and intrinsic checking of MIR                       | Map::par_for_each_module | | 
|  | | rustc_interface::passes::analysis                       | Deathness checking                                           | Map::par_for_each_module | | 
|  | | rustc_interface::passes::analysis                       | Privacy checking                                             | Map::par_for_each_module | | 
|  | | rustc_lint::late::check_crate                           | Run per-module lints                                         | Map::par_for_each_module | | 
|  | | rustc_typeck::check_crate                               | Well-formedness checking                                         | Map::par_for_each_module | | 
|  |  | 
|  | There are still many loops that have the potential to use parallel iterators. | 
|  |  | 
|  | ## Query system | 
|  |  | 
|  | The query model has some properties that make it actually feasible to evaluate | 
|  | multiple queries in parallel without too much effort: | 
|  |  | 
|  | - All data a query provider can access is via the query context, so | 
|  | the query context can take care of synchronizing access. | 
|  | - Query results are required to be immutable so they can safely be used by | 
|  | different threads concurrently. | 
|  |  | 
|  | When a query `foo` is evaluated, the cache table for `foo` is locked. | 
|  |  | 
|  | - If there already is a result, we can clone it, release the lock and | 
|  | we are done. | 
|  | - If there is no cache entry and no other active query invocation computing the | 
|  | same result, we mark the key as being "in progress", release the lock and | 
|  | start evaluating. | 
|  | - If there *is* another query invocation for the same key in progress, we | 
|  | release the lock, and just block the thread until the other invocation has | 
|  | computed the result we are waiting for. **Cycle error detection** in the parallel | 
|  | compiler requires more complex logic than in single-threaded mode. When | 
|  | worker threads in parallel queries stop making progress due to interdependence, | 
|  | the compiler uses an extra thread *(named deadlock handler)* to detect, remove and | 
|  | report the cycle error. | 
|  |  | 
|  | The parallel query feature still has implementation to do, most of which is | 
|  | related to the previous `Data Structures` and `Parallel Iterators`. See [this | 
|  | open feature tracking issue][tracking]. | 
|  |  | 
|  | ## Rustdoc | 
|  |  | 
|  | As of <!-- date-check--> November 2022, there are still a number of steps to | 
|  | complete before `rustdoc` rendering can be made parallel (see a open discussion | 
|  | of [parallel `rustdoc`][parallel-rustdoc]). | 
|  |  | 
|  | ## Resources | 
|  |  | 
|  | Here are some resources that can be used to learn more: | 
|  |  | 
|  | - [This IRLO thread by alexchricton about performance][irlo1] | 
|  | - [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0] | 
|  | - [This list of interior mutability in the compiler by nikomatsakis][imlist] | 
|  |  | 
|  | [`rayon`]: https://crates.io/crates/rayon | 
|  | [imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md | 
|  | [irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606 | 
|  | [irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503 | 
|  | [monomorphization]: backend/monomorph.md | 
|  | [parallel-rustdoc]: https://github.com/rust-lang/rust/issues/82741 | 
|  | [rustc-rayon]: https://github.com/rust-lang/rustc-rayon | 
|  | [tracking]: https://github.com/rust-lang/rust/issues/48685 |