Miri can be traced to understand how much time is spent in its various components (e.g. borrow tracker, data race checker, etc.). When tracing is enabled, running Miri will create a .json
trace file that can be opened and analyzed in Perfetto. For any questions regarding this documentation you may contact Stypox on Zulip.
All of the tracing functionality in Miri is gated by the "tracing"
feature flag to ensure it does not create any overhead when unneeded. To compile Miri with this feature enabled, you can pass --features=tracing
to ./miri
. Then, to make running Miri actually produce a trace file, you also need to set the MIRI_TRACING
environment variable. For example:
MIRI_TRACING=1 ./miri run --features=tracing ./tests/pass/hello.rs
If you are building Miri from within the rustc tree, you need to enable the "tracing"
feature by adding this line to bootstrap.toml
:
build.tool.miri.features = ["tracing"]
And then you could run the following:
MIRI_TRACING=1 ./x.py run miri --stage 1 --args ./src/tools/miri/tests/pass/hello.rs
After running Miri with tracing enabled you will get a .json
trace file that contains a list of all events and spans that occurred throughout the execution. The file follows this format.
To analyze traces you can use Perfetto UI, a trace analyzer made by Google that was originally a part of the Chrome browser. Just open Perfetto and drag and drop the .json
file there. Official documentation for the controls in the UI can be found here.
You will see the boxes “Global Legacy Events” and “Process 1” on the left of the workspace: after clicking on either of them their timeline will expand and you will be able to zoom in and look at individual spans (and events).
Spans are represented as colored boxes in the timeline, while instantaneous events are represented by tiny arrows. (Events exist because rustc and Miri also use the tracing
crate for debug logging, and those logs turn into events in the trace.)
You can click on a span or an event to get more information about it, including some arguments that were passed when the span/event was entered/fired. In the following screenshot you can see the details of a “layouting” span that was generated by the following line in Miri's code:
let _trace = enter_trace_span!(M, layouting::fn_abi_of_instance, ?instance, ?extra_args);
Perfetto supports querying the span/event database using SQL queries (see the docs). Just type :
in the search bar at the top to enter SQL mode, and then you will be able to enter SQL queries there. The relevant SQL tables are:
slices
: contains all spans and events; events can be distinguished from spans since their dur
is 0. Relevant columns are:id
: a unique primary-key ID for the span (assigned by Perfetto, not present in the trace file)ts
and dur
: the beginning and duration of the span, in nanosecondsname
: the name of the spanparent_id
: the parent span ID, or null if there is no parent (assigned by Perfetto based on the timing at which spans occur, i.e. two nested spans must be one the child of the other)arg_set_id
: a foreign key into the table of arguments (1-to-N)args
: contains all of the arguments of the various events/spans. Relevant columns are:arg_set_id
: the key used to join the slices and args tableskey
: the name of the argument prepended with “args.”display_value
: the value of the argumentSome useful queries are provided in the following sections.
On the “Process 1” timeline line there are some spans with the same name, that are actually generated from different places in Miri's code. In those cases the span name indicates the component that was invoked (e.g. the data race checker), but not the specific function that was run. To inspect the specific function, we store a “subname” in an argument with the same name as the span, which unfortunately can be seen only after clicking on the span.
To make it quicker to look at subnames, you can add a new timeline line that specifically shows the subnames for spans with a specific name. To do so:
$NAME
)$NAME
(or args.$NAME
)$NAME
, but now with the subname displayed insteadThe following screenshot shows the 4 steps for spans named “data_race”:
Unfortunately the instructions in Enhancing the timeline only work well with spans under “Process 1”, but misbehave with spans under “Global Legacy Events” (see the screenshot below). This might be a bug in Perfetto, but nevertheless a workaround is available:
:
to enter SQL modeselect slices.id, ts, dur, track_id, category, args.string_value as name, depth, stack_id, parent_stack_id, parent_id, slices.arg_set_id, thread_ts, thread_instruction_count, thread_instruction_delta, cat, slice_id from slices inner join args using (arg_set_id) where args.key = "args." || name and name = "SPAN_NAME"
What the SQL does is to select only spans with the name “SPAN_NAME” and keep all of the span fields untouched, except for the name which is replaced with the subname. As explained in Enhancing the timeline, remember that the subname is stored in an argument with the same name as the span.
The simplest way to get aggregate statistics about a time range is to:
Note that the numbers shown in the “Slices” and “Pivot Table” tabs also include nested spans, so they cannot be used to compute statistics such as “X% of time is spent in spans named Y” because two spans named Y might be nested and their duration would be counted twice. For such statistics use the method in Compute aggregate statistics (enhanced).
The following (long but not complicated) query can be used to find out how much time is spent in spans (grouped by their name). Only spans without a parent are considered towards the computations (see where parent_id is null
): so for example if validate_operand
in turn calls layouting
(which generates a nested/child span), only the validate_operand
statistics are increased. This query also excludes auxiliary spans (see name != "frame" and name != "step"
).
Note that this query does not allow selecting a time range, but that can be done by adding a condition, e.g. ts + dur > MIN_T and ts < MAX_T
would match only spans that intersect the range (MIN_T, MAX_T)
. Remember that the time unit is nanoseconds.
select "TOTAL PROGRAM DURATION" as name, count(*), max(ts + dur) as "sum(dur)", 100.0 as "%", null as "min(dur)", null as "max(dur)", null as "avg(dur)", null as "stddev(dur)" from slices union select "TOTAL OVER ALL SPANS (excluding events)" as name, count(*), sum(dur), cast(cast(sum(dur) as float) / (select max(ts + dur) from slices) * 1000 as int) / 10.0 as "%", min(dur), max(dur), cast(avg(dur) as int) as "avg(dur)", cast(sqrt(avg(dur*dur)-avg(dur)*avg(dur)) as int) as "stddev(dur)" from slices where parent_id is null and name != "frame" and name != "step" and dur > 0 union select name, count(*), sum(dur), cast(cast(sum(dur) as float) / (select max(ts + dur) from slices) * 1000 as int) / 10.0 as "%", min(dur), max(dur), cast(avg(dur) as int) as "avg(dur)", cast(sqrt(avg(dur*dur)-avg(dur)*avg(dur)) as int) as "stddev(dur)" from slices where parent_id is null and name != "frame" and name != "step" group by name order by sum(dur) desc, count(*) desc
This is the kind of table you would get out:
Use the following SQL to see statistics about the subnames of spans with the same name (replace “SPAN_NAME” with the name of the span you want to see subname statistics of):
select args.string_value as name, count(*), sum(dur), min(dur), max(dur), cast(avg(dur) as int) as "avg(dur)", cast(sqrt(avg(dur*dur)-avg(dur)*avg(dur)) as int) as "stddev(dur)" from slices inner join args using (arg_set_id) where args.key = "args." || name and name = "SPAN_NAME" group by args.string_value order by count(*) desc
For example, this is the table of how much time is spent in each borrow tracker function:
The following SQL finds the longest periods of time where time is being spent, with the ability to click on IDs in the table of results to quickly reach the corresponding place. This can be useful to spot things that use up a significant amount of time but that are not yet covered by tracing calls.
with ordered as ( select s1.*, row_number() over (order by s1.ts) as rn from slices as s1 where s1.parent_id is null and s1.dur > 0 and s1.name != "frame" and s1.name != "step" ) select a.ts+a.dur as ts, b.ts-a.ts-a.dur as dur, a.id, a.track_id, a.category, a.depth, a.stack_id, a.parent_stack_id, a.parent_id, a.arg_set_id, a.thread_ts, a.thread_instruction_count, a.thread_instruction_delta, a.cat, a.slice_id, "empty" as name from ordered as a inner join ordered as b on a.rn=b.rn-1 order by b.ts-a.ts-a.dur desc
Unfortunately Perfetto does not seem to support saving the UI state as a preset that can be used to repeat the same analysis on multiple traces. You have to click through the various menus or run the various SQL queries every time to setup the UI as you want.
Miri is highly interconnected with rustc_const_eval
, and therefore collecting proper trace data about Miri also involves adding some tracing calls within rustc_const_eval
‘s codebase. As explained in Obtaining a trace file, tracing calls are disabled (and optimized out) when Miri’s “tracing” feature is not enabled. However, while it is possible to check for the feature from Miri‘s codebase, it’s not possible to do so from rustc_const_eval
(since it‘s a separate crate, and it’s even in a precompiled .rlib
in case of out-of-tree builds).
The solution to make it possible to check whether tracing is enabled at compile time even in rustc_const_eval
was to add a function with this signature to the Machine
trait:
fn enter_trace_span(span: impl FnOnce() -> tracing::Span) -> impl EnteredTraceSpan
where EnteredTraceSpan
is just a marker trait implemented by ()
and tracing::span::EnteredSpan
. This function returns ()
by default (without calling the span
closure), except in MiriMachine
where if tracing is enabled it will return span().entered()
.
The code in rustc_const_eval
calls this function when it wants to do tracing, and the compiler will (hopefully) optimize out tracing calls when tracing is disabled.
enter_trace_span!()
macroTo add tracing to a section of code in Miri or in rustc_const_eval
, you can use the enter_trace_span!()
macro, which takes care of the details explained in The “tracing” feature.
The enter_trace_span!()
macro accepts the same syntax as tracing::span!()
(documentation) except for a few customizations, and returns an already entered trace span. The returned value is a drop guard that will exit the span when dropped, so make sure to give it a proper scope by storing it in a variable like this:
let _trace = enter_trace_span!("My span");
When calling this macro from rustc_const_eval
you need to pass a type implementing the Machine
trait as the first argument (since it will be used to call Machine::enter_trace_span()
). This is usually available in various parts of rustc_const_eval
under the name M
, since most of rustc_const_eval
's code is Machine
-agnostic.
let _trace = enter_trace_span!("My span"); // from Miri let _trace = enter_trace_span!(M, "My span"); // from rustc_const_eval
You can make sense of the syntaxes explained below also by looking at this Perfetto screenshot from Span/event data.
tracing::span!()
The full documentation for the tracing::span!()
syntax can be found here under “Using the Macros”. A few possibly confusing syntaxes are listed here:
// logs a span named "hello" with a field named "arg" of value 42 (works only because // 42 implements the tracing::Value trait, otherwise use one of the options below) let _trace = enter_trace_span!(M, "hello", arg = 42); // logs a field called "my_display_var" using the Display implementation let _trace = enter_trace_span!(M, "hello", %my_display_var); // logs a field called "my_debug_var" using the Debug implementation let _trace = enter_trace_span!(M, "hello", ?my_debug_var);
NAME::SUBNAME
syntaxIn addition to the syntax accepted by tracing::span!()
, the enter_trace_span!()
macro optionally allows passing the span name (i.e. the first macro argument) in the form NAME::SUBNAME
(without quotes) to indicate that the span has name “NAME” (usually the name of the component) and has an additional more specific name “SUBNAME” (usually the function name). The latter is passed to the tracing crate as a span field with the name “NAME”. This allows not being distracted by subnames when looking at the trace in Perfetto, but when deeper introspection is needed within a component, it's still possible to view the subnames directly with a few steps (see Enhancing the timeline).
// for example, the first will expand to the second let _trace = enter_trace_span!(M, borrow_tracker::on_stack_pop); let _trace = enter_trace_span!(M, "borrow_tracker", borrow_tracker = "on_stack_pop");
tracing_separate_thread
parameterMiri saves traces using the the tracing_chrome
tracing::Layer
so that they can be visualized in Perfetto. To instruct tracing_chrome
to put some spans on a separate trace thread/line than other spans when viewed in Perfetto, you can pass tracing_separate_thread = tracing::field::Empty
to the tracing macros. This is useful to separate out spans which just indicate the current step or program frame being processed by the interpreter. As explained in The timeline, those spans end up under the “Global Legacy Events” track. You should use a value of tracing::field::Empty
so that other tracing layers (e.g. the logger) will ignore the tracing_separate_thread
field. For example:
let _trace = enter_trace_span!(M, step::eval_statement, tracing_separate_thread = tracing::field::Empty);
The EnteredTraceSpan
trait contains a or_if_tracing_disabled()
function that you can use to e.g. log a line as an alternative to the tracing span for when tracing is disabled. For example:
let _trace = enter_trace_span!(M, step::eval_statement) .or_if_tracing_disabled(|| tracing::info!("eval_statement"));
Here we explain how tracing is implemented internally.
The events and spans generated throughout the codebase are collected by the tracing
crate, which then dispatches them to the code that writes to the trace file, but also to the logger if logging is enabled.
The crate that was chosen for collecting traces is tracing, since:
Layer
s (in Miri we are using tracing_chrome
to export traces for perfetto, see The tracing_chrome
layer)One major drawback of the tracing crate is, however, its big overhead. Entering and exiting a span takes on the order of 100ns, and many of Miri's spans are shorter than that, so their measurements are completely off and the program execution increases significantly. E.g. at the point of writing this documentation, enabling tracing makes Miri 5x slower. Note that this used to be even worse, see Time measurements.
tracing_chrome
layerMiri uses tracing-chrome as the Layer
that collects spans and events from the tracing crate and saves them to a file that can be opened in Perfetto. Although the crate is published on crates.io, it was not possible to depend on it from Miri, because it would bring in a separate compilation of the tracing
crate. This is because Miri does not directly depend on tracing
, and instead uses rustc‘s version through rustc-private, and apparently cargo can’t realize that the same library is being built again when rustc-private is involved.
So the solution was to copy-paste the only file in tracing-chrome into Miri. Nevertheless, this gave the possibility to make some changes to tracing-chrome, which you can read about in documentation at the top of the file that was copied to Miri.
tracing-chrome originally used std::time::Instant
to measure time, however on some x86/x86_64 Linux systems it might be unbearably slow since the underlying system call (clock_gettime
) would take ≈1.3µs. Read more here about how the Linux kernel chooses the clock source.
Therefore, on x86/x86_64 Linux systems with a CPU that has an invariant TSC counter, we read from that instead to measure time, which takes only ≈13ns. There are unfortunately a lot of caveats to this approach though, as explained in the code and in the PR. The most impactful one is that: every thread spawned in Miri that wants to trace something (which requires measuring time) needs to pin itself to a single CPU core (using sched_setaffinity
).
After compiling Miri, you can run the following command to make a flamegraph using Linux' perf
. It can be useful to spot functions that use up a significant amount of time but that are not yet covered by tracing calls.
perf record --call-graph dwarf -F 999 ./miri/target/debug/miri --edition 2021 --sysroot ~/.cache/miri ./tests/pass/hashmap.rs && perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg