|  | # Debugging support in the Rust compiler | 
|  |  | 
|  | This document explains the state of debugging tools support in the Rust compiler (rustc). | 
|  | It gives an overview of GDB, LLDB, WinDbg/CDB, | 
|  | as well as infrastructure around Rust compiler to debug Rust code. | 
|  | If you want to learn how to debug the Rust compiler itself, | 
|  | see [Debugging the Compiler]. | 
|  |  | 
|  | The material is gathered from the video, | 
|  | [Tom Tromey discusses debugging support in rustc]. | 
|  |  | 
|  | ## Preliminaries | 
|  |  | 
|  | ### Debuggers | 
|  |  | 
|  | According to Wikipedia | 
|  |  | 
|  | > A [debugger or debugging tool] is a computer program that is used to test and debug | 
|  | > other programs (the "target" program). | 
|  |  | 
|  | Writing a debugger from scratch for a language requires a lot of work, especially if | 
|  | debuggers have to be supported on various platforms. GDB and LLDB, however, can be | 
|  | extended to support debugging a language. This is the path that Rust has chosen. | 
|  | This document's main goal is to document the said debuggers support in Rust compiler. | 
|  |  | 
|  | ### DWARF | 
|  |  | 
|  | According to the [DWARF] standard website | 
|  |  | 
|  | > DWARF is a debugging file format used by many compilers and debuggers to support source level | 
|  | > debugging. It addresses the requirements of a number of procedural languages, | 
|  | > such as C, C++, and Fortran, and is designed to be extensible to other languages. | 
|  | > DWARF is architecture independent and applicable to any processor or operating system. | 
|  | > It is widely used on Unix, Linux and other operating systems, | 
|  | > as well as in stand-alone environments. | 
|  |  | 
|  | DWARF reader is a program that consumes the DWARF format and creates debugger compatible output. | 
|  | This program may live in the compiler itself.  DWARF uses a data structure called | 
|  | Debugging Information Entry (DIE) which stores the information as "tags" to denote functions, | 
|  | variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc. | 
|  | You can also invent your own tags and attributes. | 
|  |  | 
|  | ### CodeView/PDB | 
|  |  | 
|  | [PDB] (Program Database) is a file format created by Microsoft that contains debug information. | 
|  | PDBs can be consumed by debuggers such as WinDbg/CDB and other tools to display debug information. | 
|  | A PDB contains multiple streams that describe debug information about a specific binary such | 
|  | as types, symbols, and source files used to compile the given binary. CodeView is another | 
|  | format which defines the structure of [symbol records] and [type records] that appear within | 
|  | PDB streams. | 
|  |  | 
|  | ## Supported debuggers | 
|  |  | 
|  | ### GDB | 
|  |  | 
|  | #### Rust expression parser | 
|  |  | 
|  | To be able to show debug output, we need an expression parser. | 
|  | This (GDB) expression parser is written in [Bison], | 
|  | and can parse only a subset of Rust expressions. | 
|  | GDB parser was written from scratch and has no relation to any other parser, | 
|  | including that of rustc. | 
|  |  | 
|  | GDB has Rust-like value and type output. It can print values and types in a way | 
|  | that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB, | 
|  | it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust]. | 
|  |  | 
|  | #### Parser extensions | 
|  |  | 
|  | Expression parser has a couple of extensions in it to facilitate features that you cannot do | 
|  | with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special | 
|  | code in the DWARF reader in GDB to support the extensions. | 
|  |  | 
|  | A couple of examples of DWARF reader support needed are as follows: | 
|  |  | 
|  | 1. Enum: Needed for support for enum types. | 
|  | The Rust compiler writes the information about enum into DWARF, | 
|  | and GDB reads the DWARF to understand where is the tag field, | 
|  | or if there is a tag field, | 
|  | or if the tag slot is shared with non-zero optimization etc. | 
|  |  | 
|  | 2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF | 
|  | also points to a stub description of the corresponding vtable which in turn points to the | 
|  | concrete type for which this trait object exists. This means that you can do a `print *object` | 
|  | for that trait object, and GDB will understand how to find the correct type of the payload in | 
|  | the trait object. | 
|  |  | 
|  | **TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than | 
|  | this guide page so there is no duplication. This is regarding the following comments: | 
|  |  | 
|  | [This comment by Tom](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r284027340) | 
|  | > gdb's Rust extensions and limitations are documented in the gdb manual: | 
|  | https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that | 
|  | gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser | 
|  | implements the gdb @ extension. | 
|  |  | 
|  | [This question by Aman](https://github.com/rust-lang/rustc-dev-guide/pull/316#discussion_r285401353) | 
|  | > @tromey do you think we should mention this part in the GDB-Rust document rather than this | 
|  | document so there is no duplication etc.? | 
|  |  | 
|  | ### LLDB | 
|  |  | 
|  | #### Rust expression parser | 
|  |  | 
|  | This expression parser is written in C++. It is a type of [Recursive Descent parser]. | 
|  | It implements slightly less of the Rust language than GDB. | 
|  | LLDB has Rust-like value and type output. | 
|  |  | 
|  | #### Developer notes | 
|  |  | 
|  | * LLDB has a plugin architecture but that does not work for language support. | 
|  | * GDB generally works better on Linux. | 
|  |  | 
|  | ### WinDbg/CDB | 
|  |  | 
|  | Microsoft provides [Windows Debugging Tools] such as the Windows Debugger (WinDbg) and | 
|  | the Console Debugger (CDB) which both support debugging programs written in Rust. These | 
|  | debuggers parse the debug info for a binary from the `PDB`, if available, to construct a | 
|  | visualization to serve up in the debugger. | 
|  |  | 
|  | #### Natvis | 
|  |  | 
|  | Both WinDbg and CDB support defining and viewing custom visualizations for any given type | 
|  | within the debugger using the Natvis framework. The Rust compiler defines a set of Natvis | 
|  | files that define custom visualizations for a subset of types in the standard libraries such | 
|  | as, `std`, `core`, and `alloc`. These Natvis files are embedded into `PDBs` generated by the | 
|  | `*-pc-windows-msvc` target triples to automatically enable these custom visualizations when | 
|  | debugging. This default can be overridden by setting the `strip` rustc flag to either `debuginfo` | 
|  | or `symbols`. | 
|  |  | 
|  | Rust has support for embedding Natvis files for crates outside of the standard libraries by | 
|  | using the `#[debugger_visualizer]` attribute. | 
|  | For more details on how to embed debugger visualizers, | 
|  | please refer to the section on the [`debugger_visualizer` attribute]. | 
|  |  | 
|  | ## DWARF and `rustc` | 
|  |  | 
|  | [DWARF] is the standard way compilers generate debugging information that debuggers read. | 
|  | It is _the_ debugging format on macOS and Linux. | 
|  | It is a multi-language and extensible format, | 
|  | and is mostly good enough for Rust's purposes. | 
|  | Hence, the current implementation reuses DWARF's concepts. | 
|  | This is true even if some of the concepts in DWARF do not align with Rust semantically because, | 
|  | generally, there can be some kind of mapping between the two. | 
|  |  | 
|  | We have some DWARF extensions that the Rust compiler emits and the debuggers understand that | 
|  | are _not_ in the DWARF standard. | 
|  |  | 
|  | * Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a | 
|  | `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object | 
|  | pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb | 
|  | repository: | 
|  |  | 
|  | ```asm | 
|  | <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) | 
|  | <1aa>   DW_AT_containing_type: <0x1b4> | 
|  | <1ae>   DW_AT_name        : (indirect string, offset: 0x23d): vtable | 
|  | <1b2>   DW_AT_byte_size   : 0 | 
|  | <1b3>   DW_AT_alignment   : 8 | 
|  | ``` | 
|  |  | 
|  | * The other extension is that the Rust compiler can emit a tagless discriminated union. | 
|  | See [DWARF feature request] for this item. | 
|  |  | 
|  | ### Current limitations of DWARF | 
|  |  | 
|  | * Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF. | 
|  | * DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits | 
|  | fields with `__0` and debuggers look for a sequence of such names to overcome this limitation. | 
|  | For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`. | 
|  | This is resolved via the Rust parser in the debugger so now you can do `x.0`. | 
|  |  | 
|  | DWARF relies on debuggers to know some information about platform ABI. | 
|  | Rust does not do that all the time. | 
|  |  | 
|  | ## Developer notes | 
|  |  | 
|  | This section is from the talk about certain aspects of development. | 
|  |  | 
|  | ## What is missing | 
|  |  | 
|  | ### Code signing for LLDB debug server on macOS | 
|  |  | 
|  | According to Wikipedia, [System Integrity Protection] is | 
|  |  | 
|  | > System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature | 
|  | > of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of | 
|  | > mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned | 
|  | > files and directories against modifications by processes without a specific "entitlement", | 
|  | > even when executed by the root user or a user with root privileges (sudo). | 
|  |  | 
|  | It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be | 
|  | code signed. The certificate that signs it has to be trusted on your machine. | 
|  |  | 
|  | See [Apple developer documentation for System Integrity Protection]. | 
|  |  | 
|  | We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if | 
|  | Mozilla cannot do this because it is at the maximum number of | 
|  | keys it is allowed to sign. Tom does not know if Mozilla could get more keys. | 
|  |  | 
|  | Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. | 
|  | This problem is not technical in nature. If we had such a key we could sign GDB as well and | 
|  | ship that. | 
|  |  | 
|  | ### DWARF and Traits | 
|  |  | 
|  | Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()` | 
|  | does not work as is. The reason being that method is implemented by a trait, as opposed | 
|  | to a type. That information is not present so finding trait methods is missing. | 
|  |  | 
|  | DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this | 
|  | interface type as traits. | 
|  |  | 
|  | DWARF only deals with concrete names, not the reference types. So, a given implementation of a | 
|  | trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for | 
|  | which it is implemented would describe all the interfaces this type implements. This requires a | 
|  | DWARF extension. | 
|  |  | 
|  | Issue on Github: [https://github.com/rust-lang/rust/issues/33014] | 
|  |  | 
|  | ## Typical process for a Debug Info change (LLVM) | 
|  |  | 
|  | LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. | 
|  | This is why we need to change LLVM first because that is emitted first and not DWARF directly. | 
|  | This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off | 
|  | some LLVM DI builder methods are called to construct representation of a type. | 
|  |  | 
|  | The steps of this process are as follows: | 
|  |  | 
|  | 1. LLVM needs changing. | 
|  |  | 
|  | LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first. | 
|  |  | 
|  | Get sign off on LLVM maintainers that this is a good idea. | 
|  |  | 
|  | 2. Change the DWARF extension. | 
|  |  | 
|  | 3. Update the debuggers. | 
|  |  | 
|  | Update DWARF readers, expression evaluators. | 
|  |  | 
|  | 4. Update Rust compiler. | 
|  |  | 
|  | Change it to emit this new information. | 
|  |  | 
|  | ### Procedural macro stepping | 
|  |  | 
|  | A deeply profound question is that how do you actually debug a procedural macro? | 
|  | What is the location you emit for a macro expansion? Consider some of the following cases - | 
|  |  | 
|  | * You can emit location of the invocation of the macro. | 
|  | * You can emit the location of the definition of the macro. | 
|  | * You can emit locations of the content of the macro. | 
|  |  | 
|  | RFC: [https://github.com/rust-lang/rfcs/pull/2117] | 
|  |  | 
|  | Focus is to let macros decide what to do. This can be achieved by having some kind of attribute | 
|  | that lets the macro tell the compiler where the line marker should be. This affects where you | 
|  | set the breakpoints and what happens when you step it. | 
|  |  | 
|  | ## Source file checksums in debug info | 
|  |  | 
|  | Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that | 
|  | contributed to the associated binary. | 
|  |  | 
|  | The cryptographic hash can be used by a debugger to verify that the source file matches the | 
|  | executable. If the source file does not match, the debugger can provide a warning to the user. | 
|  |  | 
|  | The hash can also be used to prove that a given source file has not been modified since it was | 
|  | used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities, | 
|  | using SHA256 is recommended for this application. | 
|  |  | 
|  | The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in | 
|  | the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata. | 
|  |  | 
|  | A default hashing algorithm is set in the target specification. This allows the target to | 
|  | specify the best hash available, since not all targets support all hash algorithms. | 
|  |  | 
|  | The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=` | 
|  | command-line option. | 
|  |  | 
|  | #### DWARF 5 | 
|  | DWARF version 5 supports embedding an MD5 hash to validate the source file version in use. | 
|  | DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5 | 
|  |  | 
|  | #### LLVM | 
|  | LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node. | 
|  |  | 
|  | [LLVM DIFile documentation](https://llvm.org/docs/LangRef.html#difile) | 
|  |  | 
|  | #### Microsoft Visual C++ Compiler /ZH option | 
|  | The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH` | 
|  | compiler option. | 
|  |  | 
|  | [MSVC /ZH documentation](https://docs.microsoft.com/en-us/cpp/build/reference/zh) | 
|  |  | 
|  | #### Clang | 
|  | Clang always embeds an MD5 checksum, though this does not appear in documentation. | 
|  |  | 
|  | ## Future work | 
|  |  | 
|  | #### Name mangling changes | 
|  |  | 
|  | * New demangler in `libiberty` (gcc source tree). | 
|  | * New demangler in LLVM or LLDB. | 
|  |  | 
|  | **TODO**: Check the location of the demangler source. [#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157) | 
|  |  | 
|  | #### Reuse Rust compiler for expressions | 
|  |  | 
|  | This is an important idea because debuggers by and large do not try to implement type | 
|  | inference. You need to be much more explicit when you type into the debugger than your | 
|  | actual source code. So, you cannot just copy and paste an expression from your source | 
|  | code to debugger and expect the same answer but this would be nice. This can be helped | 
|  | by using compiler. | 
|  |  | 
|  | It is certainly doable but it is a large project. You certainly need a bridge to the | 
|  | debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) | 
|  | have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. | 
|  |  | 
|  | Both debuggers expression evaluation implement both a superset and a subset of Rust. | 
|  | They implement just the expression language, | 
|  | but they also add some extensions like GDB has convenience variables. | 
|  | Therefore, if you are taking this route, | 
|  | then you not only need to do this bridge, | 
|  | but may have to add some mode to let the compiler understand some extensions. | 
|  |  | 
|  | [Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4 | 
|  | [Debugging the Compiler]: compiler-debugging.md | 
|  | [debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger | 
|  | [Bison]: https://www.gnu.org/software/bison/ | 
|  | [ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html | 
|  | [rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki | 
|  | [DWARF]: http://dwarfstd.org | 
|  | [manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html | 
|  | [GDB Bugzilla]: https://sourceware.org/bugzilla/ | 
|  | [Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser | 
|  | [System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection | 
|  | [https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb | 
|  | [DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2 | 
|  | [https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html | 
|  | [https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117 | 
|  | [https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014 | 
|  | [https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457 | 
|  | [Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11 | 
|  | [https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb | 
|  | [https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project | 
|  | [PDB]: https://llvm.org/docs/PDB/index.html | 
|  | [symbol records]: https://llvm.org/docs/PDB/CodeViewSymbols.html | 
|  | [type records]: https://llvm.org/docs/PDB/CodeViewTypes.html | 
|  | [Windows Debugging Tools]: https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/ | 
|  | [`debugger_visualizer` attribute]: https://doc.rust-lang.org/nightly/reference/attributes/debugger.html#the-debugger_visualizer-attribute |