| # PhantomData |
| |
| When working with unsafe code, we can often end up in a situation where |
| types or lifetimes are logically associated with a struct, but not actually |
| part of a field. This most commonly occurs with lifetimes. For instance, the |
| `Iter` for `&'a [T]` is (approximately) defined as follows: |
| |
| ```rust,compile_fail |
| struct Iter<'a, T: 'a> { |
| ptr: *const T, |
| end: *const T, |
| } |
| ``` |
| |
| However because `'a` is unused within the struct's body, it's *unbounded*. |
| [Because of the troubles this has historically caused][unused-param], |
| unbounded lifetimes and types are *forbidden* in struct definitions. |
| Therefore we must somehow refer to these types in the body. |
| Correctly doing this is necessary to have correct variance and drop checking. |
| |
| [unused-param]: https://rust-lang.github.io/rfcs/0738-variance.html#the-corner-case-unused-parameters-and-parameters-that-are-only-used-unsafely |
| |
| We do this using `PhantomData`, which is a special marker type. `PhantomData` |
| consumes no space, but simulates a field of the given type for the purpose of |
| static analysis. This was deemed to be less error-prone than explicitly telling |
| the type-system the kind of variance that you want, while also providing other |
| useful things such as auto traits and the information needed by drop check. |
| |
| Iter logically contains a bunch of `&'a T`s, so this is exactly what we tell |
| the `PhantomData` to simulate: |
| |
| ```rust |
| use std::marker; |
| |
| struct Iter<'a, T: 'a> { |
| ptr: *const T, |
| end: *const T, |
| _marker: marker::PhantomData<&'a T>, |
| } |
| ``` |
| |
| and that's it. The lifetime will be bounded, and your iterator will be covariant |
| over `'a` and `T`. Everything Just Works. |
| |
| ## Generic parameters and drop-checking |
| |
| In the past, there used to be another thing to take into consideration. |
| |
| This very documentation used to say: |
| |
| > Another important example is Vec, which is (approximately) defined as follows: |
| > |
| > ```rust |
| > struct Vec<T> { |
| > data: *const T, // *const for variance! |
| > len: usize, |
| > cap: usize, |
| > } |
| > ``` |
| > |
| > Unlike the previous example, it *appears* that everything is exactly as we |
| > want. Every generic argument to Vec shows up in at least one field. |
| > Good to go! |
| > |
| > Nope. |
| > |
| > The drop checker will generously determine that `Vec<T>` does not own any values |
| > of type T. This will in turn make it conclude that it doesn't need to worry |
| > about Vec dropping any T's in its destructor for determining drop check |
| > soundness. This will in turn allow people to create unsoundness using |
| > Vec's destructor. |
| > |
| > In order to tell the drop checker that we *do* own values of type T, and |
| > therefore may drop some T's when *we* drop, we must add an extra `PhantomData` |
| > saying exactly that: |
| > |
| > ```rust |
| > use std::marker; |
| > |
| > struct Vec<T> { |
| > data: *const T, // *const for variance! |
| > len: usize, |
| > cap: usize, |
| > _owns_T: marker::PhantomData<T>, |
| > } |
| > ``` |
| |
| But ever since [RFC 1238](https://rust-lang.github.io/rfcs/1238-nonparametric-dropck.html), |
| **this is no longer true nor necessary**. |
| |
| If you were to write: |
| |
| ```rust |
| struct Vec<T> { |
| data: *const T, // `*const` for variance! |
| len: usize, |
| cap: usize, |
| } |
| |
| # #[cfg(any())] |
| impl<T> Drop for Vec<T> { /* … */ } |
| ``` |
| |
| then the existence of that `impl<T> Drop for Vec<T>` makes it so Rust will consider |
| that that `Vec<T>` _owns_ values of type `T` (more precisely: may use values of type `T` |
| in its `Drop` implementation), and Rust will thus not allow them to _dangle_ should a |
| `Vec<T>` be dropped. |
| |
| When a type already has a `Drop impl`, **adding an extra `_owns_T: PhantomData<T>` field |
| is thus _superfluous_ and accomplishes nothing**, dropck-wise (it still affects variance |
| and auto-traits). |
| |
| - (advanced edge case: if the type containing the `PhantomData` has no `Drop` impl at all, |
| but still has drop glue (by having _another_ field with drop glue), then the |
| dropck/`#[may_dangle]` considerations mentioned herein do apply as well: a `PhantomData<T>` |
| field will then require `T` to be droppable whenever the containing type goes out of scope). |
| |
| ___ |
| |
| But this situation can sometimes lead to overly restrictive code. That's why the |
| standard library uses an unstable and `unsafe` attribute to opt back into the old |
| "unchecked" drop-checking behavior, that this very documentation warned about: the |
| `#[may_dangle]` attribute. |
| |
| ### An exception: the special case of the standard library and its unstable `#[may_dangle]` |
| |
| This section can be skipped if you are only writing your own library code; but if you are |
| curious about what the standard library does with the actual `Vec` definition, you'll notice |
| that it still needs to use a `_owns_T: PhantomData<T>` field for soundness. |
| |
| <details><summary>Click here to see why</summary> |
| |
| Consider the following example: |
| |
| ```rust |
| fn main() { |
| let mut v: Vec<&str> = Vec::new(); |
| let s: String = "Short-lived".into(); |
| v.push(&s); |
| drop(s); |
| } // <- `v` is dropped here |
| ``` |
| |
| with a classical `impl<T> Drop for Vec<T> {` definition, the above [is denied]. |
| |
| [is denied]: https://rust.godbolt.org/z/ans15Kqz3 |
| |
| Indeed, in this case we have a `Vec</* T = */ &'s str>` vector of `'s`-lived references |
| to `str`ings, but in the case of `let s: String`, it is dropped before the `Vec` is, and |
| thus `'s` **is expired** by the time the `Vec` is dropped, and the |
| `impl<'s> Drop for Vec<&'s str> {` is used. |
| |
| This means that if such `Drop` were to be used, it would be dealing with an _expired_, or |
| _dangling_ lifetime `'s`. But this is contrary to Rust principles, where by default all |
| Rust references involved in a function signature are non-dangling and valid to dereference. |
| |
| Hence why Rust has to conservatively deny this snippet. |
| |
| And yet, in the case of the real `Vec`, the `Drop` impl does not care about `&'s str`, |
| _since it has no drop glue of its own_: it only wants to deallocate the backing buffer. |
| |
| In other words, it would be nice if the above snippet was somehow accepted, by special |
| casing `Vec`, or by relying on some special property of `Vec`: `Vec` could try to |
| _promise not to use the `&'s str`s it holds when being dropped_. |
| |
| This is the kind of `unsafe` promise that can be expressed with `#[may_dangle]`: |
| |
| ```rust ,ignore |
| unsafe impl<#[may_dangle] 's> Drop for Vec<&'s str> { /* … */ } |
| ``` |
| |
| or, more generally: |
| |
| ```rust ,ignore |
| unsafe impl<#[may_dangle] T> Drop for Vec<T> { /* … */ } |
| ``` |
| |
| is the `unsafe` way to opt out of this conservative assumption that Rust's drop |
| checker makes about type parameters of a dropped instance not being allowed to dangle. |
| |
| And when this is done, such as in the standard library, we need to be careful in the |
| case where `T` has drop glue of its own. In this instance, imagine replacing the |
| `&'s str`s with a `struct PrintOnDrop<'s> /* = */ (&'s str);` which would have a |
| `Drop` impl wherein the inner `&'s str` would be dereferenced and printed to the screen. |
| |
| Indeed, `Drop for Vec<T> {`, before deallocating the backing buffer, does have to transitively |
| drop each `T` item when it has drop glue; in the case of `PrintOnDrop<'s>`, it means that |
| `Drop for Vec<PrintOnDrop<'s>>` has to transitively drop the `PrintOnDrop<'s>`s elements before |
| deallocating the backing buffer. |
| |
| So when we said that `'s` `#[may_dangle]`, it was an excessively loose statement. We'd rather want |
| to say: "`'s` may dangle provided it not be involved in some transitive drop glue". Or, more generally, |
| "`T` may dangle provided it not be involved in some transitive drop glue". This "exception to the |
| exception" is a pervasive situation whenever **we own a `T`**. That's why Rust's `#[may_dangle]` is |
| smart enough to know of this opt-out, and will thus be disabled _when the generic parameter is held |
| in an owned fashion_ by the fields of the struct. |
| |
| Hence why the standard library ends up with: |
| |
| ```rust |
| # #[cfg(any())] |
| // we pinky-swear not to use `T` when dropping a `Vec`… |
| unsafe impl<#[may_dangle] T> Drop for Vec<T> { |
| fn drop(&mut self) { |
| unsafe { |
| if mem::needs_drop::<T>() { |
| /* … except here, that is, … */ |
| ptr::drop_in_place::<[T]>(/* … */); |
| } |
| // … |
| dealloc(/* … */) |
| // … |
| } |
| } |
| } |
| |
| struct Vec<T> { |
| // … except for the fact that a `Vec` owns `T` items and |
| // may thus be dropping `T` items on drop! |
| _owns_T: core::marker::PhantomData<T>, |
| |
| ptr: *const T, // `*const` for variance (but this does not express ownership of a `T` *per se*) |
| len: usize, |
| cap: usize, |
| } |
| ``` |
| |
| </details> |
| |
| ___ |
| |
| Raw pointers that own an allocation is such a pervasive pattern that the |
| standard library made a utility for itself called `Unique<T>` which: |
| |
| * wraps a `*const T` for variance |
| * includes a `PhantomData<T>` |
| * auto-derives `Send`/`Sync` as if T was contained |
| * marks the pointer as `NonZero` for the null-pointer optimization |
| |
| ## Table of `PhantomData` patterns |
| |
| Here’s a table of all the wonderful ways `PhantomData` could be used: |
| |
| | Phantom type | variance of `'a` | variance of `T` | `Send`/`Sync`<br/>(or lack thereof) | dangling `'a` or `T` in drop glue<br/>(_e.g._, `#[may_dangle] Drop`) | |
| |-----------------------------|:----------------:|:-----------------:|:-----------------------------------------:|:------------------------------------------------:| |
| | `PhantomData<T>` | - | **cov**ariant | inherited | disallowed ("owns `T`") | |
| | `PhantomData<&'a T>` | **cov**ariant | **cov**ariant | `Send + Sync`<br/>requires<br/>`T : Sync` | allowed | |
| | `PhantomData<&'a mut T>` | **cov**ariant | **inv**ariant | inherited | allowed | |
| | `PhantomData<*const T>` | - | **cov**ariant | `!Send + !Sync` | allowed | |
| | `PhantomData<*mut T>` | - | **inv**ariant | `!Send + !Sync` | allowed | |
| | `PhantomData<fn(T)>` | - | **contra**variant | `Send + Sync` | allowed | |
| | `PhantomData<fn() -> T>` | - | **cov**ariant | `Send + Sync` | allowed | |
| | `PhantomData<fn(T) -> T>` | - | **inv**ariant | `Send + Sync` | allowed | |
| | `PhantomData<Cell<&'a ()>>` | **inv**ariant | - | `Send + !Sync` | allowed | |
| |
| - Note: opting out of the `Unpin` auto-trait requires the dedicated [`PhantomPinned`] type instead. |
| |
| [`PhantomPinned`]: ../core/marker/struct.PhantomPinned.html |