Tracking issue: https://github.com/rust-lang/rust/issues/113349
As of November 2024, most of the rust compiler is now parallelized.
-C codegen-units=n
option to control the number of concurrent tasks.-Z threads = n
option.During monomorphization the compiler splits up all the code to be generated into smaller chunks called codegen units. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process occurs in the rustc_codegen_ssa::base
module.
The underlying thread-safe data-structures used in the parallel compiler can be found in the rustc_data_structures::sync
module. These data structures are implemented differently depending on whether parallel-compiler
is true.
data structure | parallel | non-parallel |
---|---|---|
Lock<T> | (parking_lot::Mutex<T>) | (std::cell::RefCell) |
RwLock<T> | (parking_lot::RwLock<T>) | (std::cell::RefCell) |
MTLock<T> | (Lock<T>) | (T) |
ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref |
MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref |
WriteGuard | parking_lot::RwLockWriteGuard | std::cell::RefMut |
MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut |
LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
These thread-safe data structures are interspersed during compilation which can cause lock contention resulting in degraded performance as the number of threads increases beyond 4. So we audit the use of these data structures which leads to either a refactoring so as to reduce the use of shared state, or the authoring of persistent documentation covering the specific of the invariants, the atomicity, and the lock orderings.
On the other hand, we still need to figure out what other invariants during compilation might not hold in parallel compilation.
WorkerLocal
is a special data structure implemented for parallel compilers. It holds worker-locals values for each thread in a thread pool. You can only access the worker local value through the Deref
impl
on the thread pool it was constructed on. It panics otherwise.
WorkerLocal
is used to implement the Arena
allocator in the parallel environment, which is critical in parallel queries. Its implementation is located in the rustc_data_structures::sync::worker_local
module. However, in the non-parallel compiler, it is implemented as (OneThread<T>)
, whose T
can be accessed directly through Deref::deref
.
The parallel iterators provided by the rayon
crate are easy ways to implement parallelism. In the current implementation of the parallel compiler we use a custom fork of rayon
to run tasks in parallel.
Some iterator functions are implemented to run loops in parallel when parallel-compiler
is true.
Function(Omit Send and Sync ) | Introduction | Owning Module |
---|---|---|
par_iter<T: IntoParallelIterator>(t: T) -> T::Iter | generate a parallel iterator | rustc_data_structure::sync |
par_for_each_in<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run for_each on each element | rustc_data_structure::sync |
Map::par_body_owners(self, f: impl Fn(LocalDefId)) | run f on all hir owners in the crate | rustc_middle::hir::map |
Map::par_for_each_module(self, f: impl Fn(LocalDefId)) | run f on all modules and sub modules in the crate | rustc_middle::hir::map |
ModuleItems::par_items(&self, f: impl Fn(ItemId)) | run f on all items in the module | rustc_middle::hir |
ModuleItems::par_trait_items(&self, f: impl Fn(TraitItemId)) | run f on all trait items in the module | rustc_middle::hir |
ModuleItems::par_impl_items(&self, f: impl Fn(ImplItemId)) | run f on all impl items in the module | rustc_middle::hir |
ModuleItems::par_foreign_items(&self, f: impl Fn(ForeignItemId)) | run f on all foreign items in the module | rustc_middle::hir |
There are a lot of loops in the compiler which can possibly be parallelized using these functions. As of August 2022, scenarios where the parallel iterator function has been used are as follows:
caller | scenario | callee |
---|---|---|
rustc_metadata::rmeta::encoder::prefetch_mir | Prefetch queries which will be needed later by metadata encoding | par_iter |
rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in |
rustc_interface::passes::analysis | Check the validity of the match statements | Map::par_body_owners |
rustc_interface::passes::analysis | MIR borrow check | Map::par_body_owners |
rustc_typeck::check::typeck_item_bodies | Type check | Map::par_body_owners |
rustc_interface::passes::hir_id_validator::check_crate | Check the validity of hir | Map::par_for_each_module |
rustc_interface::passes::analysis | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module |
rustc_interface::passes::analysis | Liveness and intrinsic checking of MIR | Map::par_for_each_module |
rustc_interface::passes::analysis | Deathness checking | Map::par_for_each_module |
rustc_interface::passes::analysis | Privacy checking | Map::par_for_each_module |
rustc_lint::late::check_crate | Run per-module lints | Map::par_for_each_module |
rustc_typeck::check_crate | Well-formedness checking | Map::par_for_each_module |
There are still many loops that have the potential to use parallel iterators.
The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much effort:
When a query foo
is evaluated, the cache table for foo
is locked.
The parallel query feature still has implementation to do, most of which is related to the previous Data Structures
and Parallel Iterators
. See this open feature tracking issue.
As of November 2022, there are still a number of steps to complete before rustdoc
rendering can be made parallel (see a open discussion of parallel rustdoc
).
Here are some resources that can be used to learn more: