| # Backend Agnostic Codegen |
| |
| [`rustc_codegen_ssa`] |
| provides an abstract interface for all backends to implement, |
| namely LLVM, [Cranelift], and [GCC]. |
| |
| [Cranelift]: https://github.com/rust-lang/rustc_codegen_cranelift |
| [GCC]: https://github.com/rust-lang/rustc_codegen_gcc |
| [`rustc_codegen_ssa`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html |
| |
| Below is some background information on the refactoring that created this |
| abstract interface. |
| |
| ## Refactoring of `rustc_codegen_llvm` |
| by Denis Merigoux, October 23rd 2018 |
| |
| ### State of the code before the refactoring |
| |
| All the code related to the compilation of MIR into LLVM IR was contained |
| inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most |
| important elements: |
| * the `back` folder (7,800 LOC) implements the mechanisms for creating the |
| different object files and archive through LLVM, but also the communication |
| mechanisms for parallel code generation; |
| * the `debuginfo` (3,200 LOC) folder contains all code that passes debug |
| information down to LLVM; |
| * the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with |
| LLVM using the C++ API; |
| * the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM |
| IR; |
| * the `base.rs` (1,300 LOC) file contains some helper functions but also the |
| high-level code that launches the code generation and distributes the work. |
| * the `builder.rs` (1,200 LOC) file contains all the functions generating |
| individual LLVM IR instructions inside a basic block; |
| * the `common.rs` (450 LOC) contains various helper functions and all the |
| functions generating LLVM static values; |
| * the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR. |
| |
| The goal of this refactoring is to separate inside this crate code that is |
| specific to the LLVM from code that can be reused for other rustc backends. For |
| instance, the `mir` folder is almost entirely backend-specific but it relies |
| heavily on other parts of the crate. The separation of the code must not affect |
| the logic of the code nor its performance. |
| |
| For these reasons, the separation process involves two transformations that |
| have to be done at the same time for the resulting code to compile: |
| |
| 1. replace all the LLVM-specific types by generics inside function signatures |
| and structure definitions; |
| 2. encapsulate all functions calling the LLVM FFI inside a set of traits that |
| will define the interface between backend-agnostic code and the backend. |
| |
| While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new |
| traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name |
| suggestion by @eddyb). |
| |
| ### Generic types and structures |
| |
| @irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a |
| generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This |
| work has been extended to all structures inside the `mir` folder and elsewhere, |
| as well as for LLVM's `BasicBlock` and `Type` types. |
| |
| The two most important structures for the LLVM codegen are `CodegenCx` and |
| `Builder`. They are parametrized by multiple lifetime parameters and the type |
| for `Value`. |
| |
| ```rust,ignore |
| struct CodegenCx<'ll, 'tcx> { |
| /* ... */ |
| } |
| |
| struct Builder<'a, 'll, 'tcx> { |
| cx: &'a CodegenCx<'ll, 'tcx>, |
| /* ... */ |
| } |
| ``` |
| |
| `CodegenCx` is used to compile one codegen-unit that can contain multiple |
| functions, whereas `Builder` is created to compile one basic block. |
| |
| The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime |
| parameters, that correspond to the following: |
| * `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt` |
| containing the program's information; |
| * `'a` is a short-lived reference of a `CodegenCx` or another object inside a |
| struct; |
| * `'ll` is the lifetime of references to LLVM objects such as `Value` or |
| `Type`. |
| |
| Although there are already many lifetime parameters in the code, making it |
| generic uncovered situations where the borrow-checker was passing only due to |
| the special nature of the LLVM objects manipulated (they are extern pointers). |
| For instance, an additional lifetime parameter had to be added to |
| `LocalAnalyser` in `analyse.rs`, leading to the definition: |
| |
| ```rust,ignore |
| struct LocalAnalyzer<'mir, 'a, 'tcx> { |
| /* ... */ |
| } |
| ``` |
| |
| However, the two most important structures `CodegenCx` and `Builder` are not |
| defined in the backend-agnostic code. Indeed, their content is highly specific |
| of the backend and it makes more sense to leave their definition to the backend |
| implementor than to allow just a narrow spot via a generic field for the |
| backend's context. |
| |
| ### Traits and interface |
| |
| Because they have to be defined by the backend, `CodegenCx` and `Builder` will |
| be the structures implementing all the traits defining the backend's interface. |
| These traits are defined in the folder `rustc_codegen_ssa/traits` and all the |
| backend-agnostic code is parametrized by them. For instance, let us explain how |
| a function in `base.rs` is parametrized: |
| |
| ```rust,ignore |
| pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>( |
| cx: &'a Bx::CodegenCx, |
| instance: Instance<'tcx> |
| ) { |
| /* ... */ |
| } |
| ``` |
| |
| In this signature, we have the two lifetime parameters explained earlier and |
| the master type `Bx` which satisfies the trait `BuilderMethods` corresponding |
| to the interface satisfied by the `Builder` struct. The `BuilderMethods` |
| defines an associated type `Bx::CodegenCx` that itself satisfies the |
| `CodegenMethods` traits implemented by the struct `CodegenCx`. |
| |
| On the trait side, here is an example with part of the definition of |
| `BuilderMethods` in `traits/builder.rs`: |
| |
| ```rust,ignore |
| pub trait BuilderMethods<'a, 'tcx>: |
| HasCodegen<'tcx> |
| + DebugInfoBuilderMethods<'tcx> |
| + ArgTypeMethods<'tcx> |
| + AbiBuilderMethods<'tcx> |
| + IntrinsicCallMethods<'tcx> |
| + AsmBuilderMethods<'tcx> |
| { |
| fn new_block<'b>( |
| cx: &'a Self::CodegenCx, |
| llfn: Self::Function, |
| name: &'b str |
| ) -> Self; |
| /* ... */ |
| fn cond_br( |
| &mut self, |
| cond: Self::Value, |
| then_llbb: Self::BasicBlock, |
| else_llbb: Self::BasicBlock, |
| ); |
| /* ... */ |
| } |
| ``` |
| |
| Finally, a master structure implementing the `ExtraBackendMethods` trait is |
| used for high-level codegen-driving functions like `codegen_crate` in |
| `base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`. |
| `ExtraBackendMethods` should be implemented by the same structure that |
| implements the `CodegenBackend` defined in |
| `rustc_codegen_utils/codegen_backend.rs`. |
| |
| During the traitification process, certain functions have been converted from |
| methods of a local structure to methods of `CodegenCx` or `Builder` and a |
| corresponding `self` parameter has been added. Indeed, LLVM stores information |
| internally that it can access when called through its API. This information |
| does not show up in a Rust data structure carried around when these methods are |
| called. However, when implementing a Rust backend for `rustc`, these methods |
| will need information from `CodegenCx`, hence the additional parameter (unused |
| in the LLVM implementation of the trait). |
| |
| ### State of the code after the refactoring |
| |
| The traits offer an API which is very similar to the API of LLVM. This is not |
| the best solution since LLVM has a very special way of doing things: when |
| adding another backend, the traits definition might be changed in order to |
| offer more flexibility. |
| |
| However, the current separation between backend-agnostic and LLVM-specific code |
| has allowed the reuse of a significant part of the old `rustc_codegen_llvm`. |
| Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the |
| most important elements: |
| |
| * `back` folder: 3,800 (BA) vs 4,100 (LLVM); |
| * `mir` folder: 4,400 (BA) vs 0 (LLVM); |
| * `base.rs`: 1,100 (BA) vs 250 (LLVM); |
| * `builder.rs`: 1,400 (BA) vs 0 (LLVM); |
| * `common.rs`: 350 (BA) vs 350 (LLVM); |
| |
| The `debuginfo` folder has been left almost untouched by the splitting and is |
| specific to LLVM. Only its high-level features have been traitified. |
| |
| The new `traits` folder has 1500 LOC only for trait definitions. Overall, the |
| 27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new |
| 18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized |
| `rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of |
| approximately 10,000 LOC that would otherwise have had to be duplicated between |
| the multiple backends of `rustc`. |
| |
| The refactored version of `rustc`'s backend introduced no regression over the |
| test suite nor in performance benchmark, which is in coherence with the nature |
| of the refactoring that used only compile-time parametricity (no trait |
| objects). |