As described in Overview of the compiler, the Rust compiler is still (as of July 2021) transitioning from a traditional “pass-based” setup to a “demand-driven” system. The compiler query system is the key to rustc's demand-driven organization. The idea is pretty simple. Instead of entirely independent passes (parsing, type-checking, etc.), a set of function-like queries compute information about the input source. For example, there is a query called type_of
that, given the DefId
of some item, will compute the type of that item and return it to you.
Query execution is memoized. The first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you invoke a query, the result may be returned to you by loading stored data from disk.[^incr-comp-detail]
Eventually, we want the entire compiler control-flow to be query driven. There will effectively be one top-level query (compile
) that will run compilation on a crate; this will in turn demand information about that crate, starting from the end. For example:
compile
query might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM).Although this vision is not fully realized, large sections of the compiler (for example, generating MIR) currently work exactly like this.
[^incr-comp-detail]: The Incremental compilation in detail chapter gives a more in-depth description of what queries are and how they work. If you intend to write a query of your own, this is a good read.
Invoking a query is simple. The TyCtxt
(“type context”) struct offers a method for each defined query. For example, to invoke the type_of
query, you would just do this:
let ty = tcx.type_of(some_def_id);
So you may be wondering what happens when you invoke a query method. The answer is that, for each query, the compiler maintains a cache – if your query has already been executed, then, the answer is simple: we clone the return value out of the cache and return it (therefore, you should try to ensure that the return types of queries are cheaply cloneable; insert an Rc
if necessary).
If, however, the query is not in the cache, then the compiler will call the corresponding provider function. A provider is a function implemented in a specific module and manually registered into the Providers
struct during compiler initialization. The macro system generates the Providers
struct, which acts as a function table for all query implementations, where each field is a function pointer to the actual provider.
Note: The Providers
struct is generated by macros and acts as a function table for all query implementations. It is not a Rust trait, but a plain struct with function pointer fields.
Providers are defined per-crate. The compiler maintains, internally, a table of providers for every crate, at least conceptually. Right now, there are really two sets: the providers for queries about the local crate (that is, the one being compiled) and providers for queries about external crates (that is, dependencies of the local crate). Note that what determines the crate that a query is targeting is not the kind of query, but the key. For example, when you invoke tcx.type_of(def_id)
, that could be a local query or an external query, depending on what crate the def_id
is referring to (see the self::keys::Key
trait for more information on how that works).
Providers always have the same signature:
fn provider<'tcx>( tcx: TyCtxt<'tcx>, key: QUERY_KEY, ) -> QUERY_RESULT { ... }
Providers take two arguments: the tcx
and the query key. They return the result of the query.
N.B. Most of the rustc_*
crates only provide local providers. Almost all extern providers wind up going through the rustc_metadata
crate, which loads the information from the crate metadata. But in some cases there are crates that provide queries for both local and external crates, in which case they define both a provide
and a provide_extern
function, through wasm_import_module_map
, that rustc_driver
can invoke.
When the tcx is created, it is given the providers by its creator using the Providers
struct. This struct is generated by the macros here, but it is basically a big list of function pointers:
struct Providers { type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>, // ... one field for each query }
The Providers
struct is filled in during compiler initialization, mainly by the rustc_driver
crate.
But the actual provider functions are implemented in various rustc_*
crates (like rustc_middle
, rustc_hir_analysis
, etc).
To register providers, each crate exposes a provide
function that looks like this:
pub fn provide(providers: &mut Providers) { *providers = Providers { type_of, // ... add more providers here ..*providers }; }
Providers
struct and sets the fields to point to the correct provider functions.providers.type_of = type_of;
.Suppose you want to add a new query called fubar
. You would:
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
provide
function:pub fn provide(providers: &mut Providers) { *providers = Providers { fubar, ..*providers }; }
How do you add a new query? Defining a query takes place in two steps:
To declare the query name and arguments, you simply add an entry to the big macro invocation in compiler/rustc_middle/src/query/mod.rs
. Then you need to add a documentation comment to it with some internal description. Then, provide the desc
attribute which contains a user-facing description of the query. The desc
attribute is shown to the user in query cycles.
This looks something like:
rustc_queries! { /// Records the type of every item. query type_of(key: DefId) -> Ty<'tcx> { cache_on_disk_if { key.is_local() } desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) } } ... }
A query definition has the following form:
query type_of(key: DefId) -> Ty<'tcx> { ... } ^^^^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^ | | | | | | | | | query modifiers | | | result type | | query key type | name of query query keyword
Let's go over these elements one by one:
tcx.type_of(..)
). Also used as the name of a struct (ty::queries::type_of
) that will be generated to represent this query.ty::query::keys::Key
trait, which defines (for example) how to map it to a crate, and so forth.RefCell
or other interior mutability and (b) be cheaply cloneable. Interning or using Rc
or Arc
is recommended for non-trivial data types.[^steal]So, to add a query:
rustc_queries!
using the format above.provide
method; or add a new one if needed and ensure that rustc_driver
is invoking it.[^steal]: The one exception to those rules is the ty::steal::Steal
type, which is used to cheaply modify MIR in place. See the definition of Steal
for more details. New uses of Steal
should not be added without alerting @rust-lang/compiler
.
Related design ideas, and tracking issues:
More discussion and issues: