| # Parallel compilation |
| |
| <div class="warning"> |
| As of <!-- date-check --> November 2024, |
| the parallel front-end is undergoing significant changes, |
| so this page contains quite a bit of outdated information. |
| |
| Tracking issue: <https://github.com/rust-lang/rust/issues/113349> |
| </div> |
| |
| As of <!-- date-check --> November 2024, most of the rust compiler is now |
| parallelized. |
| |
| - The codegen part is executed concurrently by default. You can use the `-C |
| codegen-units=n` option to control the number of concurrent tasks. |
| - The parts after HIR lowering to codegen such as type checking, borrowing |
| checking, and mir optimization are parallelized in the nightly version. |
| Currently, they are executed in serial by default, and parallelization is |
| manually enabled by the user using the `-Z threads = n` option. |
| - Other parts, such as lexical parsing, HIR lowering, and macro expansion, are |
| still executed in serial mode. |
| |
| <div class="warning"> |
| The following sections are kept for now but are quite outdated. |
| </div> |
| |
| --- |
| |
| [codegen]: backend/codegen.md |
| |
| ## Code generation |
| |
| During monomorphization the compiler splits up all the code to |
| be generated into smaller chunks called _codegen units_. These are then generated by |
| independent instances of LLVM running in parallel. At the end, the linker |
| is run to combine all the codegen units together into one binary. This process |
| occurs in the [`rustc_codegen_ssa::base`] module. |
| |
| [`rustc_codegen_ssa::base`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/index.html |
| |
| ## Data structures |
| |
| The underlying thread-safe data-structures used in the parallel compiler |
| can be found in the [`rustc_data_structures::sync`] module. These data structures |
| are implemented differently depending on whether `parallel-compiler` is true. |
| |
| | data structure | parallel | non-parallel | |
| | -------------------------------- | --------------------------------------------------- | ------------ | |
| | Lock\<T> | (parking_lot::Mutex\<T>) | (std::cell::RefCell) | |
| | RwLock\<T> | (parking_lot::RwLock\<T>) | (std::cell::RefCell) | |
| | MTLock\<T> | (Lock\<T>) | (T) | |
| | ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref | |
| | MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref | |
| | WriteGuard | parking_lot::RwLockWriteGuard | std::cell::RefMut | |
| | MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut | |
| | LockGuard | parking_lot::MutexGuard | std::cell::RefMut | |
| |
| - These thread-safe data structures are interspersed during compilation which |
| can cause lock contention resulting in degraded performance as the number of |
| threads increases beyond 4. So we audit the use of these data structures |
| which leads to either a refactoring so as to reduce the use of shared state, |
| or the authoring of persistent documentation covering the specific of the |
| invariants, the atomicity, and the lock orderings. |
| |
| - On the other hand, we still need to figure out what other invariants |
| during compilation might not hold in parallel compilation. |
| |
| [`rustc_data_structures::sync`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/index.html |
| |
| ### WorkerLocal |
| |
| [`WorkerLocal`] is a special data structure implemented for parallel compilers. It |
| holds worker-locals values for each thread in a thread pool. You can only |
| access the worker local value through the `Deref` `impl` on the thread pool it |
| was constructed on. It panics otherwise. |
| |
| `WorkerLocal` is used to implement the `Arena` allocator in the parallel |
| environment, which is critical in parallel queries. Its implementation is |
| located in the [`rustc_data_structures::sync::worker_local`] module. However, |
| in the non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T` |
| can be accessed directly through `Deref::deref`. |
| |
| [`rustc_data_structures::sync::worker_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/index.html |
| [`WorkerLocal`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/struct.WorkerLocal.html |
| |
| ## Parallel iterator |
| |
| The parallel iterators provided by the [`rayon`] crate are easy ways to |
| implement parallelism. In the current implementation of the parallel compiler |
| we use a custom [fork][rustc-rayon] of `rayon` to run tasks in parallel. |
| |
| Some iterator functions are implemented to run loops in parallel |
| when `parallel-compiler` is true. |
| |
| | Function(Omit `Send` and `Sync`) | Introduction | Owning Module | |
| | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- | |
| | **par_iter**<T: IntoParallelIterator>(t: T) -> T::Iter | generate a parallel iterator | rustc_data_structure::sync | |
| | **par_for_each_in**<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run `for_each` on each element | rustc_data_structure::sync | |
| | **Map::par_body_owners**(self, f: impl Fn(LocalDefId)) | run `f` on all hir owners in the crate | rustc_middle::hir::map | |
| | **Map::par_for_each_module**(self, f: impl Fn(LocalDefId)) | run `f` on all modules and sub modules in the crate | rustc_middle::hir::map | |
| | **ModuleItems::par_items**(&self, f: impl Fn(ItemId)) | run `f` on all items in the module | rustc_middle::hir | |
| | **ModuleItems::par_trait_items**(&self, f: impl Fn(TraitItemId)) | run `f` on all trait items in the module | rustc_middle::hir | |
| | **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module | rustc_middle::hir | |
| | **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module | rustc_middle::hir | |
| |
| There are a lot of loops in the compiler which can possibly be parallelized |
| using these functions. As of <!-- date-check--> August 2022, scenarios where |
| the parallel iterator function has been used are as follows: |
| |
| | caller | scenario | callee | |
| | ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ | |
| | rustc_metadata::rmeta::encoder::prefetch_mir | Prefetch queries which will be needed later by metadata encoding | par_iter | |
| | rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in | |
| | rustc_interface::passes::analysis | Check the validity of the match statements | Map::par_body_owners | |
| | rustc_interface::passes::analysis | MIR borrow check | Map::par_body_owners | |
| | rustc_typeck::check::typeck_item_bodies | Type check | Map::par_body_owners | |
| | rustc_interface::passes::hir_id_validator::check_crate | Check the validity of hir | Map::par_for_each_module | |
| | rustc_interface::passes::analysis | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module | |
| | rustc_interface::passes::analysis | Liveness and intrinsic checking of MIR | Map::par_for_each_module | |
| | rustc_interface::passes::analysis | Deathness checking | Map::par_for_each_module | |
| | rustc_interface::passes::analysis | Privacy checking | Map::par_for_each_module | |
| | rustc_lint::late::check_crate | Run per-module lints | Map::par_for_each_module | |
| | rustc_typeck::check_crate | Well-formedness checking | Map::par_for_each_module | |
| |
| There are still many loops that have the potential to use parallel iterators. |
| |
| ## Query system |
| |
| The query model has some properties that make it actually feasible to evaluate |
| multiple queries in parallel without too much effort: |
| |
| - All data a query provider can access is via the query context, so |
| the query context can take care of synchronizing access. |
| - Query results are required to be immutable so they can safely be used by |
| different threads concurrently. |
| |
| When a query `foo` is evaluated, the cache table for `foo` is locked. |
| |
| - If there already is a result, we can clone it, release the lock and |
| we are done. |
| - If there is no cache entry and no other active query invocation computing the |
| same result, we mark the key as being "in progress", release the lock and |
| start evaluating. |
| - If there *is* another query invocation for the same key in progress, we |
| release the lock, and just block the thread until the other invocation has |
| computed the result we are waiting for. **Cycle error detection** in the parallel |
| compiler requires more complex logic than in single-threaded mode. When |
| worker threads in parallel queries stop making progress due to interdependence, |
| the compiler uses an extra thread *(named deadlock handler)* to detect, remove and |
| report the cycle error. |
| |
| The parallel query feature still has implementation to do, most of which is |
| related to the previous `Data Structures` and `Parallel Iterators`. See [this |
| open feature tracking issue][tracking]. |
| |
| ## Rustdoc |
| |
| As of <!-- date-check--> November 2022, there are still a number of steps to |
| complete before `rustdoc` rendering can be made parallel (see a open discussion |
| of [parallel `rustdoc`][parallel-rustdoc]). |
| |
| ## Resources |
| |
| Here are some resources that can be used to learn more: |
| |
| - [This IRLO thread by alexchricton about performance][irlo1] |
| - [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0] |
| - [This list of interior mutability in the compiler by nikomatsakis][imlist] |
| |
| [`rayon`]: https://crates.io/crates/rayon |
| [imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md |
| [irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606 |
| [irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503 |
| [monomorphization]: backend/monomorph.md |
| [parallel-rustdoc]: https://github.com/rust-lang/rust/issues/82741 |
| [rustc-rayon]: https://github.com/rust-lang/rustc-rayon |
| [tracking]: https://github.com/rust-lang/rust/issues/48685 |