blob: 73fa68d36076ade0bcf40ab0ab6afc03ba506577 [file] [log] [blame] [view] [edit]
# Rust Codegen
The first phase in debug info generation requires Rust to inspect the MIR of the program and
communicate it to LLVM. This is primarily done in [`rustc_codegen_llvm/debuginfo`][llvm_di], though
some type-name processing exists in [`rustc_codegen_ssa/debuginfo`][ssa_di]. Rust communicates to
LLVM via the `DIBuilder` API - a thin wrapper around LLVM's internals that exists in
[rustc_llvm][rustc_llvm].
[llvm_di]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_llvm/src/debuginfo
[ssa_di]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_ssa/src/debuginfo
[rustc_llvm]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_llvm
# Type Information
Type information typically consists of the type name, size, alignment, as well as things like
fields, generic parameters, and storage modifiers if they are relevant. Much of this work happens in
[rustc_codegen_llvm/src/debuginfo/metadata][di_metadata].
[di_metadata]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_llvm/src/debuginfo/metadata.rs
It is important to keep in mind that the goal is not necessarily "represent types exactly how they
appear in Rust", rather it is to represent them in a way that allows debuggers to most accurately
reconstruct the data during debugging. This distinction is vital to understanding the core work that
occurs on this layer; many changes made here will be for the purpose of working around debugger
limitations when no other option will work.
## Quirks
Rust's generated DI nodes "pretend" to be C/C++ for both CDB and LLDB's sake. This can result in
some unintuitive and non-idiomatic debug info.
### Pointers and Reference
Wide pointers/references/`Box` are treated as a struct with 2 fields: `data_ptr` and `length`.
All non-wide pointers, references, and `Box` pointers are output as pointer nodes, and no
distinction is made between `mut` and non-`mut`. Several attempts have been made to rectify this,
but unfortunately there is not a straightforward solution. Using the `reference` DI nodes of the
respective formats has pitfalls. There is a semantic difference between C++ references and Rust
references that is unreconcilable.
>From [cppreference](https://en.cppreference.com/w/cpp/language/reference.html):
>
>References are not objects; **they do not necessarily occupy storage**, although the compiler may
>allocate storage if it is necessary to implement the desired semantics (e.g. a non-static data
>member of reference type usually increases the size of the class by the amount necessary to store
>a memory address).
>
>Because references are not objects, **there are no arrays of references, no pointers to references, and no references to references**
The current proposed solution is to simply [typedef the pointer nodes][issue_144394].
[issue_144394]: https://github.com/rust-lang/rust/pull/144394
Using the `const` qualifier to denote non-`mut` poses potential issues due to LLDB's internal
optimizations. In short, LLDB attempts to cache the child-values of variables (e.g. struct fields,
array elements) when stepping through code. A heuristic is used to determine which values are safely
cache-able, and `const` is part of that heuristic. Research has not been done into how this would
interact with things like Rust's interior mutability constructs.
### DWARF vs PDB
While most of the type information is fairly straight forward, one notable issue is the debug info
format of the target. Each format has different semantics and limitations, as such they require
slightly different debug info in some cases. This is gated by calls to
[`cpp_like_debuginfo`][cpp_like].
[cpp_like]: https://github.com/rust-lang/rust/blob/main/compiler/rustc_codegen_ssa/src/debuginfo/type_names.rs#L813
### Naming
Rust attempts to communicate type names as accurately as possible, but debuggers and debug info
formats do not always respect that.
Due to limitations in MSVC's expression parser, the following name transformations are made for PDB
debug info:
| Rust name | MSVC name |
| --- | --- |
| `&str`/`&mut str` | `ref$<str$>`/`ref_mut$<str$>` |
| `&[T]`/`&mut [T]` | `ref$<slice$<T> >`/`ref_mut$<slice$<T> >`[^1] |
| `[T; N]` | `array$<T, N>` |
| `RustEnum` | `enum2$<RustEnum>` |
| `(T1, T2)` | `tuple$<T1, T2>`|
| `*const T` | `ptr_const$<T>` |
| `*mut T` | `ptr_mut$<T>` |
| `usize` | `size_t`[^2] |
| `isize` | `ptrdiff_t`[^2] |
| `uN` | `unsigned __intN`[^2] |
| `iN` | `__intN`[^2] |
| `f32` | `float`[^2] |
| `f64` | `double`[^2] |
| `f128` | `fp128`[^2] |
[^1]: MSVC's expression parser will treat `>>` as a right shift. It is necessary to separate
consecutive `>`'s with a space (`> >`) in type names.
[^2]: While these type names are generated as part of the debug info node (which is then wrapped in
a typedef node with the Rust name), once the LLVM-IR node is converted to a CodeView node, the type
name information is lost. This is because CodeView has special shorthand nodes for primitive types,
and those shorthand nodes to not have a "name" field.
### Generics
Rust outputs generic *type* information (`T` in `ArrayVec<T, N: usize>`), but not generic *value*
information (`N` in `ArrayVec<T, N: usize>`).
CodeView does not have a leaf node for generics/C++ templates, so all generic information is lost
when generating PDB debug info. There are workarounds that allow the debugger to retrieve the
generic arguments via the type name, but it is fragile solution at best. Efforts are being made to
contact Microsoft to correct this deficiency, and/or to use one of the unused CodeView node types as
a suitable equivalent.
### Type aliases
Rust outputs typedef nodes in several cases to help account for debugger limitiations, but it does
not currently output nodes for [type aliases in the source code][type_aliases].
[type_aliases]: https://doc.rust-lang.org/reference/items/type-aliases.html
### Enums
Enum DI nodes are generated in [rustc_codegen_llvm/src/debuginfo/metadata/enums][di_metadata_enums]
[di_metadata_enums]: https://github.com/rust-lang/rust/tree/main/compiler/rustc_codegen_llvm/src/debuginfo/metadata/enums
#### DWARF
DWARF has a dedicated node for discriminated unions: `DW_TAG_variant`. It is a container that
references `DW_TAG_variant_part` nodes that may or may not contain a discriminant value. The
hierarchy looks as follows:
```txt
DW_TAG_structure_type (top-level type for the coroutine)
DW_TAG_variant_part (variant part)
DW_AT_discr (reference to discriminant DW_TAG_member)
DW_TAG_member (discriminant member)
DW_TAG_variant (variant 1)
DW_TAG_variant (variant 2)
DW_TAG_variant (variant 3)
DW_TAG_structure_type (type of variant 1)
DW_TAG_structure_type (type of variant 2)
DW_TAG_structure_type (type of variant 3)
```
#### PDB
PDB does not have a dedicated node, so it generates the C equivalent of a discriminated union:
```c
union enum2$<RUST_ENUM_NAME> {
enum VariantNames {
First,
Second
};
struct Variant0 {
struct First {
// fields
};
static const enum2$<RUST_ENUM_NAME>::VariantNames NAME;
static const unsigned long DISCR_EXACT;
enum2$<RUST_ENUM_NAME>::Variant0::First value;
};
struct Variant1 {
struct Second {
// fields
};
static enum2$<RUST_ENUM_NAME>::VariantNames NAME;
static unsigned long DISCR_EXACT;
enum2$<RUST_ENUM_NAME>::Variant1::Second value;
};
enum2$<RUST_ENUM_NAME>::Variant0 variant0;
enum2$<RUST_ENUM_NAME>::Variant1 variant1;
unsigned long tag;
}
```
An important note is that due to limitations in LLDB, the `DISCR_*` value generated is always a
`u64` even if the value is not `#[repr(u64)]`. This is largely a non-issue for LLDB because the
`DISCR_*` value and the `tag` are read into `uint64_t` values regardless of their type.
# Source Information
TODO