As of September 2021, The only stage of the compiler that is already parallel is codegen. The nightly compiler implements query evaluation, but there is still a lot of work to be done. The lack of parallelism at other stages also represents an opportunity for improving compiler performance. One can try out the current parallel compiler work by enabling it in the config.toml.
These next few sections describe where and how parallelism is currently used, and the current status of making parallel compilation the default in rustc.
The underlying thread-safe data-structures used in the parallel compiler can be found in the rustc_data_structures::sync module. Some of these data structures use the parking_lot crate as well.
There are two underlying thread safe data structures used in code generation:
LrcMetadataRef -> OwningRef<Box<dyn Erased + Send + Sync>, [u8]>rustc.During monomorphization the compiler splits up all the code to be generated into smaller chunks called codegen units. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process occurs in the rustc_codegen_ssa::base module.
The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort:
When a query foo is evaluated, the cache table for foo is locked.
As of September 2021, there are still a number of steps to complete before rustdoc rendering can be made parallel. More details on this issue can be found here.
As of July 2021, work on explicitly parallelizing the compiler has stalled. There is a lot of design and correctness work that needs to be done.
These are the basic ideas in the effort to make rustc parallel:
rayon to run tasks in parallel. The custom fork allows the execution of DAGs of tasks, not just trees.Cell) into their thread-safe siblings (e.g. Mutex).As of February 2021, much of this effort is on hold due to lack of manpower. We have a working prototype with promising performance gains in many cases. However, there are two blockers:
It's not clear what invariants need to be upheld that might not hold in the face of concurrency. An auditing effort was underway, but seems to have stalled at some point.
There is a lot of lock contention, which actually degrades performance as the number of threads increases beyond 4.
Here are some resources that can be used to learn more (note that some of them are a bit out of date):