| # Exception Safety |
| |
| Although programs should use unwinding sparingly, there's a lot of code that |
| *can* panic. If you unwrap a None, index out of bounds, or divide by 0, your |
| program will panic. On debug builds, every arithmetic operation can panic |
| if it overflows. Unless you are very careful and tightly control what code runs, |
| pretty much everything can unwind, and you need to be ready for it. |
| |
| Being ready for unwinding is often referred to as *exception safety* |
| in the broader programming world. In Rust, there are two levels of exception |
| safety that one may concern themselves with: |
| |
| * In unsafe code, we *must* be exception safe to the point of not violating |
| memory safety. We'll call this *minimal* exception safety. |
| |
| * In safe code, it is *good* to be exception safe to the point of your program |
| doing the right thing. We'll call this *maximal* exception safety. |
| |
| As is the case in many places in Rust, Unsafe code must be ready to deal with |
| bad Safe code when it comes to unwinding. Code that transiently creates |
| unsound states must be careful that a panic does not cause that state to be |
| used. Generally this means ensuring that only non-panicking code is run while |
| these states exist, or making a guard that cleans up the state in the case of |
| a panic. This does not necessarily mean that the state a panic witnesses is a |
| fully coherent state. We need only guarantee that it's a *safe* state. |
| |
| Most Unsafe code is leaf-like, and therefore fairly easy to make exception-safe. |
| It controls all the code that runs, and most of that code can't panic. However |
| it is not uncommon for Unsafe code to work with arrays of temporarily |
| uninitialized data while repeatedly invoking caller-provided code. Such code |
| needs to be careful and consider exception safety. |
| |
| ## Vec::push_all |
| |
| `Vec::push_all` is a temporary hack to get extending a Vec by a slice reliably |
| efficient without specialization. Here's a simple implementation: |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| impl<T: Clone> Vec<T> { |
| fn push_all(&mut self, to_push: &[T]) { |
| self.reserve(to_push.len()); |
| unsafe { |
| // can't overflow because we just reserved this |
| self.set_len(self.len() + to_push.len()); |
| |
| for (i, x) in to_push.iter().enumerate() { |
| self.ptr().add(i).write(x.clone()); |
| } |
| } |
| } |
| } |
| ``` |
| |
| We bypass `push` in order to avoid redundant capacity and `len` checks on the |
| Vec that we definitely know has capacity. The logic is totally correct, except |
| there's a subtle problem with our code: it's not exception-safe! `set_len`, |
| `add`, and `write` are all fine; `clone` is the panic bomb we over-looked. |
| |
| Clone is completely out of our control, and is totally free to panic. If it |
| does, our function will exit early with the length of the Vec set too large. If |
| the Vec is looked at or dropped, uninitialized memory will be read! |
| |
| The fix in this case is fairly simple. If we want to guarantee that the values |
| we *did* clone are dropped, we can set the `len` every loop iteration. If we |
| just want to guarantee that uninitialized memory can't be observed, we can set |
| the `len` after the loop. |
| |
| ## BinaryHeap::sift_up |
| |
| Bubbling an element up a heap is a bit more complicated than extending a Vec. |
| The pseudocode is as follows: |
| |
| ```text |
| bubble_up(heap, index): |
| while index != 0 && heap[index] < heap[parent(index)]: |
| heap.swap(index, parent(index)) |
| index = parent(index) |
| ``` |
| |
| A literal transcription of this code to Rust is totally fine, but has an annoying |
| performance characteristic: the `self` element is swapped over and over again |
| uselessly. We would rather have the following: |
| |
| ```text |
| bubble_up(heap, index): |
| let elem = heap[index] |
| while index != 0 && elem < heap[parent(index)]: |
| heap[index] = heap[parent(index)] |
| index = parent(index) |
| heap[index] = elem |
| ``` |
| |
| This code ensures that each element is copied as little as possible (it is in |
| fact necessary that elem be copied twice in general). However it now exposes |
| some exception safety trouble! At all times, there exists two copies of one |
| value. If we panic in this function something will be double-dropped. |
| Unfortunately, we also don't have full control of the code: that comparison is |
| user-defined! |
| |
| Unlike Vec, the fix isn't as easy here. One option is to break the user-defined |
| code and the unsafe code into two separate phases: |
| |
| ```text |
| bubble_up(heap, index): |
| let end_index = index; |
| while end_index != 0 && heap[end_index] < heap[parent(end_index)]: |
| end_index = parent(end_index) |
| |
| let elem = heap[index] |
| while index != end_index: |
| heap[index] = heap[parent(index)] |
| index = parent(index) |
| heap[index] = elem |
| ``` |
| |
| If the user-defined code blows up, that's no problem anymore, because we haven't |
| actually touched the state of the heap yet. Once we do start messing with the |
| heap, we're working with only data and functions that we trust, so there's no |
| concern of panics. |
| |
| Perhaps you're not happy with this design. Surely it's cheating! And we have |
| to do the complex heap traversal *twice*! Alright, let's bite the bullet. Let's |
| intermix untrusted and unsafe code *for reals*. |
| |
| If Rust had `try` and `finally` like in Java, we could do the following: |
| |
| ```text |
| bubble_up(heap, index): |
| let elem = heap[index] |
| try: |
| while index != 0 && elem < heap[parent(index)]: |
| heap[index] = heap[parent(index)] |
| index = parent(index) |
| finally: |
| heap[index] = elem |
| ``` |
| |
| The basic idea is simple: if the comparison panics, we just toss the loose |
| element in the logically uninitialized index and bail out. Anyone who observes |
| the heap will see a potentially *inconsistent* heap, but at least it won't |
| cause any double-drops! If the algorithm terminates normally, then this |
| operation happens to coincide precisely with how we finish up regardless. |
| |
| Sadly, Rust has no such construct, so we're going to need to roll our own! The |
| way to do this is to store the algorithm's state in a separate struct with a |
| destructor for the "finally" logic. Whether we panic or not, that destructor |
| will run and clean up after us. |
| |
| <!-- ignore: simplified code --> |
| ```rust,ignore |
| struct Hole<'a, T: 'a> { |
| data: &'a mut [T], |
| /// `elt` is always `Some` from new until drop. |
| elt: Option<T>, |
| pos: usize, |
| } |
| |
| impl<'a, T> Hole<'a, T> { |
| fn new(data: &'a mut [T], pos: usize) -> Self { |
| unsafe { |
| let elt = ptr::read(&data[pos]); |
| Hole { |
| data, |
| elt: Some(elt), |
| pos, |
| } |
| } |
| } |
| |
| fn pos(&self) -> usize { self.pos } |
| |
| fn removed(&self) -> &T { self.elt.as_ref().unwrap() } |
| |
| fn get(&self, index: usize) -> &T { &self.data[index] } |
| |
| unsafe fn move_to(&mut self, index: usize) { |
| let index_ptr: *const _ = &self.data[index]; |
| let hole_ptr = &mut self.data[self.pos]; |
| ptr::copy_nonoverlapping(index_ptr, hole_ptr, 1); |
| self.pos = index; |
| } |
| } |
| |
| impl<'a, T> Drop for Hole<'a, T> { |
| fn drop(&mut self) { |
| // fill the hole again |
| unsafe { |
| let pos = self.pos; |
| ptr::write(&mut self.data[pos], self.elt.take().unwrap()); |
| } |
| } |
| } |
| |
| impl<T: Ord> BinaryHeap<T> { |
| fn sift_up(&mut self, pos: usize) { |
| unsafe { |
| // Take out the value at `pos` and create a hole. |
| let mut hole = Hole::new(&mut self.data, pos); |
| |
| while hole.pos() != 0 { |
| let parent = parent(hole.pos()); |
| if hole.removed() <= hole.get(parent) { break } |
| hole.move_to(parent); |
| } |
| // Hole will be unconditionally filled here; panic or not! |
| } |
| } |
| } |
| ``` |