| # Handling Zero-Sized Types |
| |
| It's time. We're going to fight the specter that is zero-sized types. Safe Rust |
| *never* needs to care about this, but Vec is very intensive on raw pointers and |
| raw allocations, which are exactly the two things that care about |
| zero-sized types. We need to be careful of two things: |
| |
| * The raw allocator API has undefined behavior if you pass in 0 for an |
| allocation size. |
| * raw pointer offsets are no-ops for zero-sized types, which will break our |
| C-style pointer iterator. |
| |
| Thankfully we abstracted out pointer-iterators and allocating handling into |
| `RawValIter` and `RawVec` respectively. How mysteriously convenient. |
| |
| ## Allocating Zero-Sized Types |
| |
| So if the allocator API doesn't support zero-sized allocations, what on earth |
| do we store as our allocation? `NonNull::dangling()` of course! Almost every operation |
| with a ZST is a no-op since ZSTs have exactly one value, and therefore no state needs |
| to be considered to store or load them. This actually extends to `ptr::read` and |
| `ptr::write`: they won't actually look at the pointer at all. As such we never need |
| to change the pointer. |
| |
| Note however that our previous reliance on running out of memory before overflow is |
| no longer valid with zero-sized types. We must explicitly guard against capacity |
| overflow for zero-sized types. |
| |
| Due to our current architecture, all this means is writing 3 guards, one in each |
| method of `RawVec`. |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| impl<T> RawVec<T> { |
| fn new() -> Self { |
| // This branch should be stripped at compile time. |
| let cap = if mem::size_of::<T>() == 0 { usize::MAX } else { 0 }; |
| |
| // `NonNull::dangling()` doubles as "unallocated" and "zero-sized allocation" |
| RawVec { |
| ptr: NonNull::dangling(), |
| cap, |
| } |
| } |
| |
| fn grow(&mut self) { |
| // since we set the capacity to usize::MAX when T has size 0, |
| // getting to here necessarily means the Vec is overfull. |
| assert!(mem::size_of::<T>() != 0, "capacity overflow"); |
| |
| let (new_cap, new_layout) = if self.cap == 0 { |
| (1, Layout::array::<T>(1).unwrap()) |
| } else { |
| // This can't overflow because we ensure self.cap <= isize::MAX. |
| let new_cap = 2 * self.cap; |
| |
| // `Layout::array` checks that the number of bytes is <= usize::MAX, |
| // but this is redundant since old_layout.size() <= isize::MAX, |
| // so the `unwrap` should never fail. |
| let new_layout = Layout::array::<T>(new_cap).unwrap(); |
| (new_cap, new_layout) |
| }; |
| |
| // Ensure that the new allocation doesn't exceed `isize::MAX` bytes. |
| assert!(new_layout.size() <= isize::MAX as usize, "Allocation too large"); |
| |
| let new_ptr = if self.cap == 0 { |
| unsafe { alloc::alloc(new_layout) } |
| } else { |
| let old_layout = Layout::array::<T>(self.cap).unwrap(); |
| let old_ptr = self.ptr.as_ptr() as *mut u8; |
| unsafe { alloc::realloc(old_ptr, old_layout, new_layout.size()) } |
| }; |
| |
| // If allocation fails, `new_ptr` will be null, in which case we abort. |
| self.ptr = match NonNull::new(new_ptr as *mut T) { |
| Some(p) => p, |
| None => alloc::handle_alloc_error(new_layout), |
| }; |
| self.cap = new_cap; |
| } |
| } |
| |
| impl<T> Drop for RawVec<T> { |
| fn drop(&mut self) { |
| let elem_size = mem::size_of::<T>(); |
| |
| if self.cap != 0 && elem_size != 0 { |
| unsafe { |
| alloc::dealloc( |
| self.ptr.as_ptr() as *mut u8, |
| Layout::array::<T>(self.cap).unwrap(), |
| ); |
| } |
| } |
| } |
| } |
| ``` |
| |
| That's it. We support pushing and popping zero-sized types now. Our iterators |
| (that aren't provided by slice Deref) are still busted, though. |
| |
| ## Iterating Zero-Sized Types |
| |
| Zero-sized offsets are no-ops. This means that our current design will always |
| initialize `start` and `end` as the same value, and our iterators will yield |
| nothing. The current solution to this is to cast the pointers to integers, |
| increment, and then cast them back: |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| impl<T> RawValIter<T> { |
| unsafe fn new(slice: &[T]) -> Self { |
| RawValIter { |
| start: slice.as_ptr(), |
| end: if mem::size_of::<T>() == 0 { |
| ((slice.as_ptr() as usize) + slice.len()) as *const _ |
| } else if slice.len() == 0 { |
| slice.as_ptr() |
| } else { |
| slice.as_ptr().add(slice.len()) |
| }, |
| } |
| } |
| } |
| ``` |
| |
| Now we have a different bug. Instead of our iterators not running at all, our |
| iterators now run *forever*. We need to do the same trick in our iterator impls. |
| Also, our size_hint computation code will divide by 0 for ZSTs. Since we'll |
| basically be treating the two pointers as if they point to bytes, we'll just |
| map size 0 to divide by 1. Here's what `next` will be: |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| fn next(&mut self) -> Option<T> { |
| if self.start == self.end { |
| None |
| } else { |
| unsafe { |
| let result = ptr::read(self.start); |
| self.start = if mem::size_of::<T>() == 0 { |
| (self.start as usize + 1) as *const _ |
| } else { |
| self.start.offset(1) |
| }; |
| Some(result) |
| } |
| } |
| } |
| ``` |
| |
| Do you see the "bug"? No one else did! The original author only noticed the |
| problem when linking to this page years later. This code is kind of dubious |
| because abusing the iterator pointers to be *counters* makes them unaligned! |
| Our *one job* when using ZSTs is to keep pointers aligned! *forehead slap* |
| |
| Raw pointers don't need to be aligned at all times, so the basic trick of |
| using pointers as counters is *fine*, but they *should* definitely be aligned |
| when passed to `ptr::read`! This is *possibly* needless pedantry |
| because `ptr::read` is a noop for a ZST, but let's be a *little* more |
| responsible and read from `NonNull::dangling` on the ZST path. |
| |
| (Alternatively you could call `read_unaligned` on the ZST path. Either is fine, |
| because either way we're making up a value from nothing and it all compiles |
| to doing nothing.) |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| impl<T> Iterator for RawValIter<T> { |
| type Item = T; |
| fn next(&mut self) -> Option<T> { |
| if self.start == self.end { |
| None |
| } else { |
| unsafe { |
| if mem::size_of::<T>() == 0 { |
| self.start = (self.start as usize + 1) as *const _; |
| Some(ptr::read(NonNull::<T>::dangling().as_ptr())) |
| } else { |
| let old_ptr = self.start; |
| self.start = self.start.offset(1); |
| Some(ptr::read(old_ptr)) |
| } |
| } |
| } |
| } |
| |
| fn size_hint(&self) -> (usize, Option<usize>) { |
| let elem_size = mem::size_of::<T>(); |
| let len = (self.end as usize - self.start as usize) |
| / if elem_size == 0 { 1 } else { elem_size }; |
| (len, Some(len)) |
| } |
| } |
| |
| impl<T> DoubleEndedIterator for RawValIter<T> { |
| fn next_back(&mut self) -> Option<T> { |
| if self.start == self.end { |
| None |
| } else { |
| unsafe { |
| if mem::size_of::<T>() == 0 { |
| self.end = (self.end as usize - 1) as *const _; |
| Some(ptr::read(NonNull::<T>::dangling().as_ptr())) |
| } else { |
| self.end = self.end.offset(-1); |
| Some(ptr::read(self.end)) |
| } |
| } |
| } |
| } |
| } |
| ``` |
| |
| And that's it. Iteration works! |