src/part-5-intro.md - rust-lang/rustc-dev-guide - Git at Google

 # From MIR to binaries

 All of the preceding chapters of this guide have one thing in common:
 we never generated any executable machine code at all!
 With this chapter, all of that changes.

 So far,
 we've shown how the compiler can take raw source code in text format
 and transform it into [MIR].
 We have also shown how the compiler does various
 analyses on the code to detect things like type or lifetime errors.
 Now, we will finally take the MIR and produce some executable machine code.

 [MIR]: ./mir/index.md

 > NOTE: This part of a compiler is often called the _backend_.
 > The term is a bit overloaded because in the compiler source,
 > it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC).
 > Usually, when you see the word "backend"  in this part,
 > we are referring to the "codegen backend".

 So what do we need to do?

 1. First, we need to collect the set of things to generate code for.
    In particular,
    we need to find out which concrete types to substitute for generic ones,
    since we need to generate code for the concrete types.
    Generating code for the concrete types
    (i.e. emitting a copy of the code for each concrete type) is called _monomorphization_,
    so the process of collecting all the concrete types is called _monomorphization collection_.
 2. Next, we need to actually lower the MIR to a codegen IR
    (usually LLVM IR) for each concrete type we collected.
 3. Finally, we need to invoke the codegen backend,
    which runs a bunch of optimization passes,
    generates executable code,
    and links together an executable binary.

 [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html

 The code for codegen is actually a bit complex due to a few factors:

 - Support for multiple codegen backends (LLVM, Cranelift, and GCC).
   We try to share as much backend code between them as possible,
   so a lot of it is generic over the codegen implementation.
   This means that there are often a lot of layers of abstraction.
 - Codegen happens asynchronously in another thread for performance.
 - The actual codegen is done by a third-party library (either of the 3 backends).

 Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code,
 while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen.

 [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
 [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html

 At a very high level, the entry point is
 [`rustc_codegen_ssa::base::codegen_crate`][codegen1].
 This function starts the process discussed in the rest of this chapter.
	# From MIR to binaries

	All of the preceding chapters of this guide have one thing in common:
	we never generated any executable machine code at all!
	With this chapter, all of that changes.

	So far,
	we've shown how the compiler can take raw source code in text format
	and transform it into [MIR].
	We have also shown how the compiler does various
	analyses on the code to detect things like type or lifetime errors.
	Now, we will finally take the MIR and produce some executable machine code.

	[MIR]: ./mir/index.md

	> NOTE: This part of a compiler is often called the _backend_.
	> The term is a bit overloaded because in the compiler source,
	> it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC).
	> Usually, when you see the word "backend" in this part,
	> we are referring to the "codegen backend".

	So what do we need to do?

	1. First, we need to collect the set of things to generate code for.
	In particular,
	we need to find out which concrete types to substitute for generic ones,
	since we need to generate code for the concrete types.
	Generating code for the concrete types
	(i.e. emitting a copy of the code for each concrete type) is called _monomorphization_,
	so the process of collecting all the concrete types is called _monomorphization collection_.
	2. Next, we need to actually lower the MIR to a codegen IR
	(usually LLVM IR) for each concrete type we collected.
	3. Finally, we need to invoke the codegen backend,
	which runs a bunch of optimization passes,
	generates executable code,
	and links together an executable binary.

	[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html

	The code for codegen is actually a bit complex due to a few factors:

	- Support for multiple codegen backends (LLVM, Cranelift, and GCC).
	We try to share as much backend code between them as possible,
	so a lot of it is generic over the codegen implementation.
	This means that there are often a lot of layers of abstraction.
	- Codegen happens asynchronously in another thread for performance.
	- The actual codegen is done by a third-party library (either of the 3 backends).

	Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code,
	while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen.

	[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
	[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html

	At a very high level, the entry point is
	[`rustc_codegen_ssa::base::codegen_crate`][codegen1].
	This function starts the process discussed in the rest of this chapter.