blob: 1160ba818861b02938056333ed523b5a97712873 [file] [log] [blame] [view]
# Profiling Queries
<!-- toc -->
In an effort to support _incremental compilation_, the latest design of the Rust
compiler consists of a _query-based_ model.
The details of this model are outside the scope of this document,
however, we explain [some background of this model](#background), in an effort
to explain how we profile its performance. We intend this profiling effort to
address [issue 42678](https://github.com/rust-lang/rust/issues/42678).
## Quick Start
### 0. Enable debug assertions
```
./configure --enable-debug-assertions
```
### 1. Compile `rustc`
Compile the compiler, up to at least stage 1:
```
./x.py build library/std
```
### 2. Run `rustc`, with flags
Run the compiler on a source file, supplying two additional debugging flags with
`-Z`:
```
rustc -Z profile-queries -Z incremental=cache foo.rs
```
Regarding the two additional parameters:
- `-Z profile-queries` tells the compiler to run a separate thread that profiles
the queries made by the main compiler thread(s).
- `-Z incremental=cache` tells the compiler to "cache" various files that
describe the compilation dependencies, in the subdirectory `cache`.
This command will generate the following files:
- `profile_queries.html` consists of an HTML-based representation of the
[trace of queries](#trace-of-queries).
- `profile_queries.counts.txt` consists of a histogram, where each histogram
"bucket" is a query provider.
### 3. Run `rustc`, with `-Z time-passes`:
- This additional flag will add all timed passes to the output files mentioned
above, in step 2. As described below, these passes appear visually distinct
from the queries in the HTML output (they appear as green boxes, via
CSS).
### 4. Inspect the output
- 4(a). Open the HTML file (`profile_queries.html`) with a browser. See
[this section](#interpret-the-html-output) for an explanation of this file.
- 4(b). Open the data file (`profile_queries.counts.txt`) with a text editor, or
spreadsheet. See [this section](#interpret-the-data-output) for an explanation
of this file.
## Interpret the HTML Output
### Example 0
The following image gives some example output, from tracing the queries of
`hello_world.rs` (a single `main` function, that prints `"hello world"` via the
macro `println!`). This image only shows a short prefix of the total output; the
_actual_ output is much longer.
[![Example HTML output](./example-0.png)][profile-example-html]
[View full HTML output][profile-example-html]. Note; it could take up
to a second to properly render depending on your browser.
Here is the corresponding [text output](./example-0.counts.txt).
[profile-example-html]: ./example-0.html
### Example 0 explanation
The trace of the queries has a formal structure; see
[Trace of Queries](#trace-of-queries) for details.
We style this formal structure as follows:
- **Timed passes:** Green boxes, when present (via `-Z time-passes`), represent
_timed passes_ in the compiler. As of <!-- date: 2021-01 --> January 2021
these passes are not queries, but may be replaced by queries in future versions.
- **Labels:** Some green and red boxes are labeled with text. Where they are
present, the labels give the following information:
- The [query's _provider_](#queries), sans its _key_ and its _result_, which
are often too long to include in these labels.
- The _duration_ of the provider, as a fraction of the total time (for the
entire trace). This fraction includes the query's entire extent (that is,
the sum total of all of its sub-queries).
- **Query hits:** Blue dots represent query hits. They consist of leaves in the
trace's tree. (CSS class: `hit`).
- **Query misses:** Red boxes represent query misses. They consist of internal
nodes in the trace's tree. (CSS class: `miss`).
- **Nesting structure:** Many red boxes contain _nested boxes and dots_. This
nesting structure reflects that some providers _depend on_ results from other
providers, which consist of their nested children.
- Some red boxes are _labeled_ with text, and have highlighted borders (light
red, and bolded). (See [heuristics](#heuristics) for details).
## Heuristics
Heuristics-based CSS Classes:
- `important` -- Trace nodes are `important` if they have an extent of 6 (or
more), _or_ they have a duration fraction of one percent (or more). These
numbers are simple heuristics (hard-coded, but easy to modify).
Important nodes are styled with textual labels, and highlighted borders (light
red, and bolded).
- `frac-50`, `-40`, ... -- Trace nodes whose total duration (self and children)
take a large fraction of the total duration, at or above 50%, 40%, and so on.
We style nodes these with larger font and padding.
## Interpret the Data Output
The file `profile_queries.counts.txt` contains a table of information about the
queries, organized around their providers.
For each provider (or timed pass, when `-Z time-passes` is present), we produce:
- A total **count** --- the total number of times this provider was queried
- A total **duration** --- the total number of seconds spent running this
provider, _including_ all providers it may depend on. To get a sense of this
dependency structure, and inspect a more fine-grained view of these durations,
see [this section](#interpret-the-html-output).
These rows are **sorted by total duration**, in descending order.
### Counts: Example 0
The following example `profile_queries.counts.txt` file results from running on
a hello world program (a single main function that uses `println` to print
`"hello world"`).
As explained above, the columns consist of `provider/pass`, `count`, `duration`:
```
translation,1,0.891
symbol_name,2658,0.733
def_symbol_name,2556,0.268
item_attrs,5566,0.162
type_of,6922,0.117
generics_of,8020,0.084
serialize dep graph,1,0.079
relevant_trait_impls_for,50,0.063
def_span,24875,0.061
expansion,1,0.059
const checking,1,0.055
adt_def,1141,0.048
trait_impls_of,32,0.045
is_copy_raw,47,0.045
is_foreign_item,2638,0.042
fn_sig,2172,0.033
adt_dtorck_constraint,2,0.023
impl_trait_ref,2434,0.023
typeck_tables_of,29,0.022
item-bodies checking,1,0.017
typeck_item_bodies,1,0.017
is_default_impl,2320,0.017
borrow checking,1,0.014
borrowck,4,0.014
mir_validated,4,0.013
adt_destructor,10,0.012
layout_raw,258,0.010
load_dep_graph,1,0.007
item-types checking,1,0.005
mir_const,2,0.005
name resolution,1,0.004
is_object_safe,35,0.003
is_sized_raw,89,0.003
parsing,1,0.003
is_freeze_raw,11,0.001
privacy checking,1,0.001
privacy_access_levels,5,0.001
resolving dependency formats,1,0.001
adt_sized_constraint,9,0.001
wf checking,1,0.001
liveness checking,1,0.001
compute_incremental_hashes_map,1,0.001
match checking,1,0.001
type collecting,1,0.001
param_env,31,0.000
effect checking,1,0.000
trait_def,140,0.000
lowering ast -> hir,1,0.000
predicates_of,70,0.000
extern_crate,319,0.000
lifetime resolution,1,0.000
is_const_fn,6,0.000
intrinsic checking,1,0.000
translation item collection,1,0.000
impl_polarity,15,0.000
creating allocators,1,0.000
language item collection,1,0.000
crate injection,1,0.000
early lint checks,1,0.000
indexing hir,1,0.000
maybe creating a macro crate,1,0.000
coherence checking,1,0.000
optimized_mir,6,0.000
is_panic_runtime,33,0.000
associated_item_def_ids,7,0.000
needs_drop_raw,10,0.000
lint checking,1,0.000
complete gated feature checking,1,0.000
stability index,1,0.000
region_maps,11,0.000
super_predicates_of,8,0.000
coherent_trait,2,0.000
AST validation,1,0.000
loop checking,1,0.000
static item recursion checking,1,0.000
variances_of,11,0.000
associated_item,5,0.000
plugin loading,1,0.000
looking for plugin registrar,1,0.000
stability checking,1,0.000
describe_def,15,0.000
variance testing,1,0.000
codegen unit partitioning,1,0.000
looking for entry point,1,0.000
checking for inline asm in case the target doesn't support it,1,0.000
inherent_impls,1,0.000
crate_inherent_impls,1,0.000
trait_of_item,7,0.000
crate_inherent_impls_overlap_check,1,0.000
attribute checking,1,0.000
internalize symbols,1,0.000
impl wf inference,1,0.000
death checking,1,0.000
reachability checking,1,0.000
reachable_set,1,0.000
is_exported_symbol,3,0.000
is_mir_available,2,0.000
unused lib feature checking,1,0.000
maybe building test harness,1,0.000
recursion limit,1,0.000
write allocator module,1,0.000
assert dep graph,1,0.000
plugin registration,1,0.000
write metadata,1,0.000
```
# Background
We give some background about the query model of the Rust compiler.
## Def IDs
In the query model, many queries have a key that consists of a Def ID. The Rust
compiler uses Def IDs to distinguish definitions in the input Rust program.
From the compiler source code (`compiler/rustc_span/src/def_id.rs`):
```
/// A DefId identifies a particular *definition*, by combining a crate
/// index and a def index.
#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)]
pub struct DefId {
pub krate: CrateNum,
pub index: DefIndex,
}
```
## Queries
A query relates a _key_ to a _result_, either by invoking a _provider_ that
computes this result, or by reusing a cached result that was provided earlier.
We explain each term in more detail:
- Query **Provider**: Each kind of query has a pre-defined _provider_, which
refers to the compiler behavior that provides an answer to the query. These
providers may nest; see [trace of queries](#trace-of-queries) for more
information about this nesting structure.
_Example providers:_
- `typeck` -- Typecheck a Def ID; produce "tables" of type
information.
- `borrowck` -- Borrow-check a Def ID.
- `optimized_mir` -- Generate an optimized MIR for a Def ID; produce MIR.
- For more examples, see [Example 0](#counts-example-0).
- Query **Key**: The input/arguments to the provider. Often, this consists of a
particular [Def ID](#def-ids).
- Query **Result**: The output of the provider.
## Trace of Queries
Formally, a _trace_ of the queries consists of a _tree_, where sub-trees
represent sub-traces. In particular, the nesting structure of the trace of
queries describes how the queries depend on one another.
Even more precisely, this tree represents a directed acyclic graph (DAG), where
shared sub-graphs consist of tree nodes that occur multiple times in the tree,
first as "cache misses" and later as "cache hits".
**Cache hits and misses.** The trace is a tree with the following possible tree
nodes:
- Query, with cache **miss**: The query's result is **unknown**, and its
provider runs to compute it. In this case, the dynamic extent of the query's
trace consists of the traced behavior of its provider.
- Query, with cache **hit**: The query's result is **known**, and is reused; its
provider does not rerun. These nodes are leaves in the trace, since they have
no dynamic extent. These leaves also represent where the tree, represented as
a DAG, would _share_ a sub-graph (namely, the sub-graph of the query that was
reused from the cache).
**Tree node metrics.** To help determine how to style this tree, we define the
following tree node metrics:
- **Depth**: The number of **ancestors** of the node in its path from the tree
root.
- **Extent**: The number of **immediate children** of the node.
Intuitively, a dependency tree is "good" for incremental caching when the depth
and extent of each node is relatively small. It is pathological when either of
these metrics grows too large. For instance, a tree node whose extent consists
of 1M immediate children means that if and when this node is re-computed, all 1M
children must be re-queried, at the very least (some may also require
recomputation, too).
## External Links
Related design ideas, and tracking issues:
- Design document:
[On-demand Rustc incremental design doc](https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md)
- Tracking Issue:
["Red/Green" dependency tracking in compiler](https://github.com/rust-lang/rust/issues/42293)
More discussion and issues:
- [GitHub issue #42633](https://github.com/rust-lang/rust/issues/42633)
- [Incremental Compilation Beta](https://internals.rust-lang.org/t/incremental-compilation-beta/4721)
- [Incremental Compilation Announcement](https://blog.rust-lang.org/2016/09/08/incremental.html)