| # Code generation attributes |
| |
| The following [attributes] are used for controlling code generation. |
| |
| ## Optimization hints |
| |
| The `cold` and `inline` [attributes] give suggestions to generate code in a |
| way that may be faster than what it would do without the hint. The attributes |
| are only hints, and may be ignored. |
| |
| Both attributes can be used on [functions]. When applied to a function in a |
| [trait], they apply only to that function when used as a default function for |
| a trait implementation and not to all trait implementations. The attributes |
| have no effect on a trait function without a body. |
| |
| ### The `inline` attribute |
| |
| The *`inline` [attribute]* suggests that a copy of the attributed function |
| should be placed in the caller, rather than generating code to call the |
| function where it is defined. |
| |
| > ***Note***: The `rustc` compiler automatically inlines functions based on |
| > internal heuristics. Incorrectly inlining functions can make the program |
| > slower, so this attribute should be used with care. |
| |
| There are three ways to use the inline attribute: |
| |
| * `#[inline]` *suggests* performing an inline expansion. |
| * `#[inline(always)]` *suggests* that an inline expansion should always be |
| performed. |
| * `#[inline(never)]` *suggests* that an inline expansion should never be |
| performed. |
| |
| > ***Note***: `#[inline]` in every form is a hint, with no *requirements* |
| > on the language to place a copy of the attributed function in the caller. |
| |
| ### The `cold` attribute |
| |
| The *`cold` [attribute]* suggests that the attributed function is unlikely to |
| be called. |
| |
| ## The `no_builtins` attribute |
| |
| The *`no_builtins` [attribute]* may be applied at the crate level to disable |
| optimizing certain code patterns to invocations of library functions that are |
| assumed to exist. |
| |
| ## The `target_feature` attribute |
| |
| The *`target_feature` [attribute]* may be applied to a function to |
| enable code generation of that function for specific platform architecture |
| features. It uses the [_MetaListNameValueStr_] syntax with a single key of |
| `enable` whose value is a string of comma-separated feature names to enable. |
| |
| ```rust |
| # #[cfg(target_feature = "avx2")] |
| #[target_feature(enable = "avx2")] |
| unsafe fn foo_avx2() {} |
| ``` |
| |
| Each [target architecture] has a set of features that may be enabled. It is an |
| error to specify a feature for a target architecture that the crate is not |
| being compiled for. |
| |
| It is [undefined behavior] to call a function that is compiled with a feature |
| that is not supported on the current platform the code is running on, *except* |
| if the platform explicitly documents this to be safe. |
| |
| Functions marked with `target_feature` are not inlined into a context that |
| does not support the given features. The `#[inline(always)]` attribute may not |
| be used with a `target_feature` attribute. |
| |
| ### Available features |
| |
| The following is a list of the available feature names. |
| |
| #### `x86` or `x86_64` |
| |
| Executing code with unsupported features is undefined behavior on this platform. |
| Hence this platform requires that `#[target_feature]` is only applied to [`unsafe` |
| functions][unsafe function]. |
| |
| Feature | Implicitly Enables | Description |
| ------------|--------------------|------------------- |
| `adx` | | [ADX] — Multi-Precision Add-Carry Instruction Extensions |
| `aes` | `sse2` | [AES] — Advanced Encryption Standard |
| `avx` | `sse4.2` | [AVX] — Advanced Vector Extensions |
| `avx2` | `avx` | [AVX2] — Advanced Vector Extensions 2 |
| `bmi1` | | [BMI1] — Bit Manipulation Instruction Sets |
| `bmi2` | | [BMI2] — Bit Manipulation Instruction Sets 2 |
| `fma` | `avx` | [FMA3] — Three-operand fused multiply-add |
| `fxsr` | | [`fxsave`] and [`fxrstor`] — Save and restore x87 FPU, MMX Technology, and SSE State |
| `lzcnt` | | [`lzcnt`] — Leading zeros count |
| `pclmulqdq` | `sse2` | [`pclmulqdq`] — Packed carry-less multiplication quadword |
| `popcnt` | | [`popcnt`] — Count of bits set to 1 |
| `rdrand` | | [`rdrand`] — Read random number |
| `rdseed` | | [`rdseed`] — Read random seed |
| `sha` | `sse2` | [SHA] — Secure Hash Algorithm |
| `sse` | | [SSE] — Streaming <abbr title="Single Instruction Multiple Data">SIMD</abbr> Extensions |
| `sse2` | `sse` | [SSE2] — Streaming SIMD Extensions 2 |
| `sse3` | `sse2` | [SSE3] — Streaming SIMD Extensions 3 |
| `sse4.1` | `ssse3` | [SSE4.1] — Streaming SIMD Extensions 4.1 |
| `sse4.2` | `sse4.1` | [SSE4.2] — Streaming SIMD Extensions 4.2 |
| `ssse3` | `sse3` | [SSSE3] — Supplemental Streaming SIMD Extensions 3 |
| `xsave` | | [`xsave`] — Save processor extended states |
| `xsavec` | | [`xsavec`] — Save processor extended states with compaction |
| `xsaveopt` | | [`xsaveopt`] — Save processor extended states optimized |
| `xsaves` | | [`xsaves`] — Save processor extended states supervisor |
| |
| <!-- Keep links near each table to make it easier to move and update. --> |
| |
| [ADX]: https://en.wikipedia.org/wiki/Intel_ADX |
| [AES]: https://en.wikipedia.org/wiki/AES_instruction_set |
| [AVX]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions |
| [AVX2]: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#AVX2 |
| [BMI1]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets |
| [BMI2]: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#BMI2 |
| [FMA3]: https://en.wikipedia.org/wiki/FMA_instruction_set |
| [`fxsave`]: https://www.felixcloutier.com/x86/fxsave |
| [`fxrstor`]: https://www.felixcloutier.com/x86/fxrstor |
| [`lzcnt`]: https://www.felixcloutier.com/x86/lzcnt |
| [`pclmulqdq`]: https://www.felixcloutier.com/x86/pclmulqdq |
| [`popcnt`]: https://www.felixcloutier.com/x86/popcnt |
| [`rdrand`]: https://en.wikipedia.org/wiki/RdRand |
| [`rdseed`]: https://en.wikipedia.org/wiki/RdRand |
| [SHA]: https://en.wikipedia.org/wiki/Intel_SHA_extensions |
| [SSE]: https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions |
| [SSE2]: https://en.wikipedia.org/wiki/SSE2 |
| [SSE3]: https://en.wikipedia.org/wiki/SSE3 |
| [SSE4.1]: https://en.wikipedia.org/wiki/SSE4#SSE4.1 |
| [SSE4.2]: https://en.wikipedia.org/wiki/SSE4#SSE4.2 |
| [SSSE3]: https://en.wikipedia.org/wiki/SSSE3 |
| [`xsave`]: https://www.felixcloutier.com/x86/xsave |
| [`xsavec`]: https://www.felixcloutier.com/x86/xsavec |
| [`xsaveopt`]: https://www.felixcloutier.com/x86/xsaveopt |
| [`xsaves`]: https://www.felixcloutier.com/x86/xsaves |
| |
| #### `aarch64` |
| |
| This platform requires that `#[target_feature]` is only applied to [`unsafe` |
| functions][unsafe function]. |
| |
| Further documentation on these features can be found in the [ARM Architecture |
| Reference Manual], or elsewhere on [developer.arm.com]. |
| |
| [ARM Architecture Reference Manual]: https://developer.arm.com/documentation/ddi0487/latest |
| [developer.arm.com]: https://developer.arm.com |
| |
| > ***Note***: The following pairs of features should both be marked as enabled |
| > or disabled together if used: |
| > - `fp` and `neon`, in order facilitate inlining in more places, among other reasons. |
| > - `paca` and `pacg`, which LLVM currently implements as one feature. |
| |
| |
| Feature | Implicitly Enables | Feature Name |
| ---------------|--------------------|------------------- |
| `aes` | `neon` | FEAT_AES - Advanced <abbr title="Single Instruction Multiple Data">SIMD</abbr> AES instructions |
| `bf16` | | FEAT_BF16 - BFloat16 instructions |
| `bti` | | FEAT_BTI - Branch Target Identification |
| `crc` | | FEAT_CRC - CRC32 checksum instructions |
| `dit` | | FEAT_DIT - Data Independent Timing instructions |
| `dotprod` | | FEAT_DotProd - Advanced SIMD Int8 dot product instructions |
| `dpb` | | FEAT_DPB - Data cache clean to point of persistence |
| `dpb2` | | FEAT_DPB2 - Data cache clean to point of deep persistence |
| `f32mm` | `sve` | FEAT_F32MM - SVE single-precision FP matrix multiply instruction |
| `f64mm` | `sve` | FEAT_F64MM - SVE double-precision FP matrix multiply instruction |
| `fcma` | `neon` | FEAT_FCMA - Floating point complex number support |
| `fhm` | `fp16` | FEAT_FHM - Half-precision FP FMLAL instructions |
| `flagm` | | FEAT_FlagM - Conditional flag manipulation |
| `fp` | | FEAT_FP - Floating point extension |
| `fp16` | `fp`, `neon` | FEAT_FP16 - Half-precision FP data processing |
| `frintts` | | FEAT_FRINTTS - Floating-point to int helper instructions |
| `i8mm` | | FEAT_I8MM - Int8 Matrix Multiplication |
| `jsconv` | `fp`, `neon` | FEAT_JSCVT - JavaScript conversion instruction |
| `lse` | | FEAT_LSE - Large System Extension |
| `lor` | | FEAT_LOR - Limited Ordering Regions extension |
| `mte` | | FEAT_MTE - Memory Tagging Extension |
| `neon` | `fp` | FEAT_AdvSIMD - Advanced SIMD extension |
| `pan` | | FEAT_PAN - Privileged Access-Never extension |
| `paca` | | FEAT_PAuth - Pointer Authentication (address authentication) |
| `pacg` | | FEAT_PAuth - Pointer Authentication (generic authentication) |
| `pmuv3` | | FEAT_PMUv3 - Performance Monitors extension (v3) |
| `rand` | | FEAT_RNG - Random Number Generator |
| `ras` | | FEAT_RAS - Reliability, Availability and Serviceability extension |
| `rcpc` | | FEAT_LRCPC - Release consistent Processor Consistent |
| `rcpc2` | `rcpc` | FEAT_LRCPC2 - RcPc with immediate offsets |
| `rdm` | | FEAT_RDM - Rounding Double Multiply accumulate |
| `sb` | | FEAT_SB - Speculation Barrier |
| `sha2` | `neon` | FEAT_SHA1 & FEAT_SHA256 - Advanced SIMD SHA instructions |
| `sha3` | `sha2` | FEAT_SHA512 & FEAT_SHA3 - Advanced SIMD SHA instructions |
| `sm4` | `neon` | FEAT_SM3 & FEAT_SM4 - Advanced SIMD SM3/4 instructions |
| `spe` | | FEAT_SPE - Statistical Profiling Extension |
| `ssbs` | | FEAT_SSBS - Speculative Store Bypass Safe |
| `sve` | `fp16` | FEAT_SVE - Scalable Vector Extension |
| `sve2` | `sve` | FEAT_SVE2 - Scalable Vector Extension 2 |
| `sve2-aes` | `sve2`, `aes` | FEAT_SVE_AES - SVE AES instructions |
| `sve2-sm4` | `sve2`, `sm4` | FEAT_SVE_SM4 - SVE SM4 instructions |
| `sve2-sha3` | `sve2`, `sha3` | FEAT_SVE_SHA3 - SVE SHA3 instructions |
| `sve2-bitperm` | `sve2` | FEAT_SVE_BitPerm - SVE Bit Permute |
| `tme` | | FEAT_TME - Transactional Memory Extension |
| `vh` | | FEAT_VHE - Virtualization Host Extensions |
| |
| #### `wasm32` or `wasm64` |
| |
| `#[target_feature]` may be used with both safe and |
| [`unsafe` functions][unsafe function] on Wasm platforms. It is impossible to |
| cause undefined behavior via the `#[target_feature]` attribute because |
| attempting to use instructions unsupported by the Wasm engine will fail at load |
| time without the risk of being interpreted in a way different from what the |
| compiler expected. |
| |
| Feature | Description |
| ------------|------------------- |
| `simd128` | [WebAssembly simd proposal][simd128] |
| |
| [simd128]: https://github.com/webassembly/simd |
| |
| ### Additional information |
| |
| See the [`target_feature` conditional compilation option] for selectively |
| enabling or disabling compilation of code based on compile-time settings. Note |
| that this option is not affected by the `target_feature` attribute, and is |
| only driven by the features enabled for the entire crate. |
| |
| See the [`is_x86_feature_detected`] or [`is_aarch64_feature_detected`] macros |
| in the standard library for runtime feature detection on these platforms. |
| |
| > Note: `rustc` has a default set of features enabled for each target and CPU. |
| > The CPU may be chosen with the [`-C target-cpu`] flag. Individual features |
| > may be enabled or disabled for an entire crate with the |
| > [`-C target-feature`] flag. |
| |
| ## The `track_caller` attribute |
| |
| The `track_caller` attribute may be applied to any function with [`"Rust"` ABI][rust-abi] |
| with the exception of the entry point `fn main`. When applied to functions and methods in |
| trait declarations, the attribute applies to all implementations. If the trait provides a |
| default implementation with the attribute, then the attribute also applies to override implementations. |
| |
| When applied to a function in an `extern` block the attribute must also be applied to any linked |
| implementations, otherwise undefined behavior results. When applied to a function which is made |
| available to an `extern` block, the declaration in the `extern` block must also have the attribute, |
| otherwise undefined behavior results. |
| |
| ### Behavior |
| |
| Applying the attribute to a function `f` allows code within `f` to get a hint of the [`Location`] of |
| the "topmost" tracked call that led to `f`'s invocation. At the point of observation, an |
| implementation behaves as if it walks up the stack from `f`'s frame to find the nearest frame of an |
| *unattributed* function `outer`, and it returns the [`Location`] of the tracked call in `outer`. |
| |
| ```rust |
| #[track_caller] |
| fn f() { |
| println!("{}", std::panic::Location::caller()); |
| } |
| ``` |
| |
| > Note: `core` provides [`core::panic::Location::caller`] for observing caller locations. It wraps |
| > the [`core::intrinsics::caller_location`] intrinsic implemented by `rustc`. |
| |
| > Note: because the resulting `Location` is a hint, an implementation may halt its walk up the stack |
| > early. See [Limitations](#limitations) for important caveats. |
| |
| #### Examples |
| |
| When `f` is called directly by `calls_f`, code in `f` observes its callsite within `calls_f`: |
| |
| ```rust |
| # #[track_caller] |
| # fn f() { |
| # println!("{}", std::panic::Location::caller()); |
| # } |
| fn calls_f() { |
| f(); // <-- f() prints this location |
| } |
| ``` |
| |
| When `f` is called by another attributed function `g` which is in turn called by `calls_g`, code in |
| both `f` and `g` observes `g`'s callsite within `calls_g`: |
| |
| ```rust |
| # #[track_caller] |
| # fn f() { |
| # println!("{}", std::panic::Location::caller()); |
| # } |
| #[track_caller] |
| fn g() { |
| println!("{}", std::panic::Location::caller()); |
| f(); |
| } |
| |
| fn calls_g() { |
| g(); // <-- g() prints this location twice, once itself and once from f() |
| } |
| ``` |
| |
| When `g` is called by another attributed function `h` which is in turn called by `calls_h`, all code |
| in `f`, `g`, and `h` observes `h`'s callsite within `calls_h`: |
| |
| ```rust |
| # #[track_caller] |
| # fn f() { |
| # println!("{}", std::panic::Location::caller()); |
| # } |
| # #[track_caller] |
| # fn g() { |
| # println!("{}", std::panic::Location::caller()); |
| # f(); |
| # } |
| #[track_caller] |
| fn h() { |
| println!("{}", std::panic::Location::caller()); |
| g(); |
| } |
| |
| fn calls_h() { |
| h(); // <-- prints this location three times, once itself, once from g(), once from f() |
| } |
| ``` |
| |
| And so on. |
| |
| ### Limitations |
| |
| This information is a hint and implementations are not required to preserve it. |
| |
| In particular, coercing a function with `#[track_caller]` to a function pointer creates a shim which |
| appears to observers to have been called at the attributed function's definition site, losing actual |
| caller information across virtual calls. A common example of this coercion is the creation of a |
| trait object whose methods are attributed. |
| |
| > Note: The aforementioned shim for function pointers is necessary because `rustc` implements |
| > `track_caller` in a codegen context by appending an implicit parameter to the function ABI, but |
| > this would be unsound for an indirect call because the parameter is not a part of the function's |
| > type and a given function pointer type may or may not refer to a function with the attribute. The |
| > creation of a shim hides the implicit parameter from callers of the function pointer, preserving |
| > soundness. |
| |
| [_MetaListNameValueStr_]: ../attributes.md#meta-item-attribute-syntax |
| [`-C target-cpu`]: ../../rustc/codegen-options/index.html#target-cpu |
| [`-C target-feature`]: ../../rustc/codegen-options/index.html#target-feature |
| [`is_x86_feature_detected`]: ../../std/arch/macro.is_x86_feature_detected.html |
| [`is_aarch64_feature_detected`]: ../../std/arch/macro.is_aarch64_feature_detected.html |
| [`target_feature` conditional compilation option]: ../conditional-compilation.md#target_feature |
| [attribute]: ../attributes.md |
| [attributes]: ../attributes.md |
| [functions]: ../items/functions.md |
| [target architecture]: ../conditional-compilation.md#target_arch |
| [trait]: ../items/traits.md |
| [undefined behavior]: ../behavior-considered-undefined.md |
| [unsafe function]: ../unsafe-functions.md |
| [rust-abi]: ../items/external-blocks.md#abi |
| [`core::intrinsics::caller_location`]: ../../core/intrinsics/fn.caller_location.html |
| [`core::panic::Location::caller`]: ../../core/panic/struct.Location.html#method.caller |
| [`Location`]: ../../core/panic/struct.Location.html |