Parallel Compilation

As of September 2021, The only stage of the compiler that is already parallel is codegen. The nightly compiler implements query evaluation, but there is still a lot of work to be done. The lack of parallelism at other stages also represents an opportunity for improving compiler performance. One can try out the current parallel compiler work by enabling it in the config.toml.

These next few sections describe where and how parallelism is currently used, and the current status of making parallel compilation the default in rustc.

The underlying thread-safe data-structures used in the parallel compiler can be found in the rustc_data_structures::sync module. Some of these data structures use the parking_lot crate as well.

Codegen

There are two underlying thread safe data structures used in code generation:

Lrc
- Which is an Arc if parallel_compiler is true, and a Rc if it is not.
MetadataRef -> OwningRef<Box<dyn Erased + Send + Sync>, [u8]>
- This data structure is specific to rustc.

During monomorphization the compiler splits up all the code to be generated into smaller chunks called codegen units. These are then generated by independent instances of LLVM running in parallel. At the end, the linker is run to combine all the codegen units together into one binary. This process occurs in the rustc_codegen_ssa::base module.

Query System

The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort:

All data a query provider can access is accessed via the query context, so the query context can take care of synchronizing access.
Query results are required to be immutable so they can safely be used by different threads concurrently.

When a query foo is evaluated, the cache table for foo is locked.

If there already is a result, we can clone it, release the lock and we are done.
If there is no cache entry and no other active query invocation computing the same result, we mark the key as being “in progress”, release the lock and start evaluating.
If there is another query invocation for the same key in progress, we release the lock, and just block the thread until the other invocation has computed the result we are waiting for. This cannot deadlock because, as mentioned before, query invocations form a DAG. Some thread will always make progress.

Rustdoc

As of September 2021, there are still a number of steps to complete before rustdoc rendering can be made parallel. More details on this issue can be found here.

Current Status

As of July 2021, work on explicitly parallelizing the compiler has stalled. There is a lot of design and correctness work that needs to be done.

These are the basic ideas in the effort to make rustc parallel:

There are a lot of loops in the compiler that just iterate over all items in a crate. These can possibly be parallelized.
We can use (a custom fork of) rayon to run tasks in parallel. The custom fork allows the execution of DAGs of tasks, not just trees.
There are currently a lot of global data structures that need to be made thread-safe. A key strategy here has been converting interior-mutable data-structures (e.g. Cell) into their thread-safe siblings (e.g. Mutex).

As of February 2021, much of this effort is on hold due to lack of manpower. We have a working prototype with promising performance gains in many cases. However, there are two blockers:

It's not clear what invariants need to be upheld that might not hold in the face of concurrency. An auditing effort was underway, but seems to have stalled at some point.
There is a lot of lock contention, which actually degrades performance as the number of threads increases beyond 4.

Here are some resources that can be used to learn more (note that some of them are a bit out of date):