src/leaking.md - rust-lang/nomicon - Git at Google

 # Leaking

 Ownership-based resource management is intended to simplify composition. You
 acquire resources when you create the object, and you release the resources when
 it gets destroyed. Since destruction is handled for you, it means you can't
 forget to release the resources, and it happens as soon as possible! Surely this
 is perfect and all of our problems are solved.

 Everything is terrible and we have new and exotic problems to try to solve.

 Many people like to believe that Rust eliminates resource leaks. In practice,
 this is basically true. You would be surprised to see a Safe Rust program
 leak resources in an uncontrolled way.

 However from a theoretical perspective this is absolutely not the case, no
 matter how you look at it. In the strictest sense, "leaking" is so abstract as
 to be unpreventable. It's quite trivial to initialize a collection at the start
 of a program, fill it with tons of objects with destructors, and then enter an
 infinite event loop that never refers to it. The collection will sit around
 uselessly, holding on to its precious resources until the program terminates (at
 which point all those resources would have been reclaimed by the OS anyway).

 We may consider a more restricted form of leak: failing to drop a value that is
 unreachable. Rust also doesn't prevent this. In fact Rust *has a function for
 doing this*: `mem::forget`. This function consumes the value it is passed *and
 then doesn't run its destructor*.

 In the past `mem::forget` was marked as unsafe as a sort of lint against using
 it, since failing to call a destructor is generally not a well-behaved thing to
 do (though useful for some special unsafe code). However this was generally
 determined to be an untenable stance to take: there are many ways to fail to
 call a destructor in safe code. The most famous example is creating a cycle of
 reference-counted pointers using interior mutability.

 It is reasonable for safe code to assume that destructor leaks do not happen, as
 any program that leaks destructors is probably wrong. However *unsafe* code
 cannot rely on destructors to be run in order to be safe. For most types this
 doesn't matter: if you leak the destructor then the type is by definition
 inaccessible, so it doesn't matter, right? For instance, if you leak a `Box<u8>`
 then you waste some memory but that's hardly going to violate memory-safety.

 However where we must be careful with destructor leaks are *proxy* types. These
 are types which manage access to a distinct object, but don't actually own it.
 Proxy objects are quite rare. Proxy objects you'll need to care about are even
 rarer. However we'll focus on three interesting examples in the standard
 library:

 * `vec::Drain`
 * `Rc`
 * `thread::scoped::JoinGuard`

 ## Drain

 `drain` is a collections API that moves data out of the container without
 consuming the container. This enables us to reuse the allocation of a `Vec`
 after claiming ownership over all of its contents. It produces an iterator
 (Drain) that returns the contents of the Vec by-value.

 Now, consider Drain in the middle of iteration: some values have been moved out,
 and others haven't. This means that part of the Vec is now full of logically
 uninitialized data! We could backshift all the elements in the Vec every time we
 remove a value, but this would have pretty catastrophic performance
 consequences.

 Instead, we would like Drain to fix the Vec's backing storage when it is
 dropped. It should run itself to completion, backshift any elements that weren't
 removed (drain supports subranges), and then fix Vec's `len`. It's even
 unwinding-safe! Easy!

 Now consider the following:

 <!-- ignore: simplified code -->
 ```rust,ignore
 let mut vec = vec![Box::new(0); 4];

 {
     // start draining, vec can no longer be accessed
     let mut drainer = vec.drain(..);

     // pull out two elements and immediately drop them
     drainer.next();
     drainer.next();

     // get rid of drainer, but don't call its destructor
     mem::forget(drainer);
 }

 // Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
 println!("{}", vec[0]);
 ```

 This is pretty clearly Not Good. Unfortunately, we're kind of stuck between a
 rock and a hard place: maintaining consistent state at every step has an
 enormous cost (and would negate any benefits of the API). Failing to maintain
 consistent state gives us Undefined Behavior in safe code (making the API
 unsound).

 So what can we do? Well, we can pick a trivially consistent state: set the Vec's
 len to be 0 when we start the iteration, and fix it up if necessary in the
 destructor. That way, if everything executes like normal we get the desired
 behavior with minimal overhead. But if someone has the *audacity* to
 mem::forget us in the middle of the iteration, all that does is *leak even more*
 (and possibly leave the Vec in an unexpected but otherwise consistent state).
 Since we've accepted that mem::forget is safe, this is definitely safe. We call
 leaks causing more leaks a *leak amplification*.

 ## Rc

 Rc is an interesting case because at first glance it doesn't appear to be a
 proxy value at all. After all, it manages the data it points to, and dropping
 all the Rcs for a value will drop that value. Leaking an Rc doesn't seem like it
 would be particularly dangerous. It will leave the refcount permanently
 incremented and prevent the data from being freed or dropped, but that seems
 just like Box, right?

 Nope.

 Let's consider a simplified implementation of Rc:

 <!-- ignore: simplified code -->
 ```rust,ignore
 struct Rc<T> {
     ptr: *mut RcBox<T>,
 }

 struct RcBox<T> {
     data: T,
     ref_count: usize,
 }

 impl<T> Rc<T> {
     fn new(data: T) -> Self {
         unsafe {
             // Wouldn't it be nice if heap::allocate worked like this?
             let ptr = heap::allocate::<RcBox<T>>();
             ptr::write(ptr, RcBox {
                 data,
                 ref_count: 1,
             });
             Rc { ptr }
         }
     }

     fn clone(&self) -> Self {
         unsafe {
             (*self.ptr).ref_count += 1;
         }
         Rc { ptr: self.ptr }
     }
 }

 impl<T> Drop for Rc<T> {
     fn drop(&mut self) {
         unsafe {
             (*self.ptr).ref_count -= 1;
             if (*self.ptr).ref_count == 0 {
                 // drop the data and then free it
                 ptr::read(self.ptr);
                 heap::deallocate(self.ptr);
             }
         }
     }
 }
 ```

 This code contains an implicit and subtle assumption: `ref_count` can fit in a
 `usize`, because there can't be more than `usize::MAX` Rcs in memory. However
 this itself assumes that the `ref_count` accurately reflects the number of Rcs
 in memory, which we know is false with `mem::forget`. Using `mem::forget` we can
 overflow the `ref_count`, and then get it down to 0 with outstanding Rcs. Then
 we can happily use-after-free the inner data. Bad Bad Not Good.

 This can be solved by just checking the `ref_count` and doing *something*. The
 standard library's stance is to just abort, because your program has become
 horribly degenerate. Also *oh my gosh* it's such a ridiculous corner case.

 ## thread::scoped::JoinGuard

 > Note: This API has already been removed from std, for more information
 > you may refer [issue #24292](https://github.com/rust-lang/rust/issues/24292).
 >
 > This section remains here because we think this example is still
 > important, regardless of whether it is part of std or not.

 The thread::scoped API intended to allow threads to be spawned that reference
 data on their parent's stack without any synchronization over that data by
 ensuring the parent joins the thread before any of the shared data goes out
 of scope.

 <!-- ignore: simplified code -->
 ```rust,ignore
 pub fn scoped<'a, F>(f: F) -> JoinGuard<'a>
     where F: FnOnce() + Send + 'a
 ```

 Here `f` is some closure for the other thread to execute. Saying that
 `F: Send + 'a` is saying that it closes over data that lives for `'a`, and it
 either owns that data or the data was Sync (implying `&data` is Send).

 Because JoinGuard has a lifetime, it keeps all the data it closes over
 borrowed in the parent thread. This means the JoinGuard can't outlive
 the data that the other thread is working on. When the JoinGuard *does* get
 dropped it blocks the parent thread, ensuring the child terminates before any
 of the closed-over data goes out of scope in the parent.

 Usage looked like:

 <!-- ignore: simplified code -->
 ```rust,ignore
 let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
 {
     let mut guards = vec![];
     for x in &mut data {
         // Move the mutable reference into the closure, and execute
         // it on a different thread. The closure has a lifetime bound
         // by the lifetime of the mutable reference `x` we store in it.
         // The guard that is returned is in turn assigned the lifetime
         // of the closure, so it also mutably borrows `data` as `x` did.
         // This means we cannot access `data` until the guard goes away.
         let guard = thread::scoped(move || {
             *x *= 2;
         });
         // store the thread's guard for later
         guards.push(guard);
     }
     // All guards are dropped here, forcing the threads to join
     // (this thread blocks here until the others terminate).
     // Once the threads join, the borrow expires and the data becomes
     // accessible again in this thread.
 }
 // data is definitely mutated here.
 ```

 In principle, this totally works! Rust's ownership system perfectly ensures it!
 ...except it relies on a destructor being called to be safe.

 <!-- ignore: simplified code -->
 ```rust,ignore
 let mut data = Box::new(0);
 {
     let guard = thread::scoped(|| {
         // This is at best a data race. At worst, it's also a use-after-free.
         *data += 1;
     });
     // Because the guard is forgotten, expiring the loan without blocking this
     // thread.
     mem::forget(guard);
 }
 // So the Box is dropped here while the scoped thread may or may not be trying
 // to access it.
 ```

 Dang. Here the destructor running was pretty fundamental to the API, and it had
 to be scrapped in favor of a completely different design.
	# Leaking

	Ownership-based resource management is intended to simplify composition. You
	acquire resources when you create the object, and you release the resources when
	it gets destroyed. Since destruction is handled for you, it means you can't
	forget to release the resources, and it happens as soon as possible! Surely this
	is perfect and all of our problems are solved.

	Everything is terrible and we have new and exotic problems to try to solve.

	Many people like to believe that Rust eliminates resource leaks. In practice,
	this is basically true. You would be surprised to see a Safe Rust program
	leak resources in an uncontrolled way.

	However from a theoretical perspective this is absolutely not the case, no
	matter how you look at it. In the strictest sense, "leaking" is so abstract as
	to be unpreventable. It's quite trivial to initialize a collection at the start
	of a program, fill it with tons of objects with destructors, and then enter an
	infinite event loop that never refers to it. The collection will sit around
	uselessly, holding on to its precious resources until the program terminates (at
	which point all those resources would have been reclaimed by the OS anyway).

	We may consider a more restricted form of leak: failing to drop a value that is
	unreachable. Rust also doesn't prevent this. In fact Rust *has a function for
	doing this: `mem::forget`. This function consumes the value it is passed and
	then doesn't run its destructor*.

	In the past `mem::forget` was marked as unsafe as a sort of lint against using
	it, since failing to call a destructor is generally not a well-behaved thing to
	do (though useful for some special unsafe code). However this was generally
	determined to be an untenable stance to take: there are many ways to fail to
	call a destructor in safe code. The most famous example is creating a cycle of
	reference-counted pointers using interior mutability.

	It is reasonable for safe code to assume that destructor leaks do not happen, as
	any program that leaks destructors is probably wrong. However unsafe code
	cannot rely on destructors to be run in order to be safe. For most types this
	doesn't matter: if you leak the destructor then the type is by definition
	inaccessible, so it doesn't matter, right? For instance, if you leak a `Box<u8>`
	then you waste some memory but that's hardly going to violate memory-safety.

	However where we must be careful with destructor leaks are proxy types. These
	are types which manage access to a distinct object, but don't actually own it.
	Proxy objects are quite rare. Proxy objects you'll need to care about are even
	rarer. However we'll focus on three interesting examples in the standard
	library:

	* `vec::Drain`
	* `Rc`
	* `thread::scoped::JoinGuard`

	## Drain

	`drain` is a collections API that moves data out of the container without
	consuming the container. This enables us to reuse the allocation of a `Vec`
	after claiming ownership over all of its contents. It produces an iterator
	(Drain) that returns the contents of the Vec by-value.

	Now, consider Drain in the middle of iteration: some values have been moved out,
	and others haven't. This means that part of the Vec is now full of logically
	uninitialized data! We could backshift all the elements in the Vec every time we
	remove a value, but this would have pretty catastrophic performance
	consequences.

	Instead, we would like Drain to fix the Vec's backing storage when it is
	dropped. It should run itself to completion, backshift any elements that weren't
	removed (drain supports subranges), and then fix Vec's `len`. It's even
	unwinding-safe! Easy!

	Now consider the following:

	<!-- ignore: simplified code -->
	```rust,ignore
	let mut vec = vec![Box::new(0); 4];

	{
	// start draining, vec can no longer be accessed
	let mut drainer = vec.drain(..);

	// pull out two elements and immediately drop them
	drainer.next();
	drainer.next();

	// get rid of drainer, but don't call its destructor
	mem::forget(drainer);
	}

	// Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
	println!("{}", vec[0]);
	```

	This is pretty clearly Not Good. Unfortunately, we're kind of stuck between a
	rock and a hard place: maintaining consistent state at every step has an
	enormous cost (and would negate any benefits of the API). Failing to maintain
	consistent state gives us Undefined Behavior in safe code (making the API
	unsound).

	So what can we do? Well, we can pick a trivially consistent state: set the Vec's
	len to be 0 when we start the iteration, and fix it up if necessary in the
	destructor. That way, if everything executes like normal we get the desired
	behavior with minimal overhead. But if someone has the audacity to
	mem::forget us in the middle of the iteration, all that does is leak even more
	(and possibly leave the Vec in an unexpected but otherwise consistent state).
	Since we've accepted that mem::forget is safe, this is definitely safe. We call
	leaks causing more leaks a leak amplification.

	## Rc

	Rc is an interesting case because at first glance it doesn't appear to be a
	proxy value at all. After all, it manages the data it points to, and dropping
	all the Rcs for a value will drop that value. Leaking an Rc doesn't seem like it
	would be particularly dangerous. It will leave the refcount permanently
	incremented and prevent the data from being freed or dropped, but that seems
	just like Box, right?

	Nope.

	Let's consider a simplified implementation of Rc:

	<!-- ignore: simplified code -->
	```rust,ignore
	struct Rc<T> {
	ptr: *mut RcBox<T>,
	}

	struct RcBox<T> {
	data: T,
	ref_count: usize,
	}

	impl<T> Rc<T> {
	fn new(data: T) -> Self {
	unsafe {
	// Wouldn't it be nice if heap::allocate worked like this?
	let ptr = heap::allocate::<RcBox<T>>();
	ptr::write(ptr, RcBox {
	data,
	ref_count: 1,
	});
	Rc { ptr }
	}
	}

	fn clone(&self) -> Self {
	unsafe {
	(*self.ptr).ref_count += 1;
	}
	Rc { ptr: self.ptr }
	}
	}

	impl<T> Drop for Rc<T> {
	fn drop(&mut self) {
	unsafe {
	(*self.ptr).ref_count -= 1;
	if (*self.ptr).ref_count == 0 {
	// drop the data and then free it
	ptr::read(self.ptr);
	heap::deallocate(self.ptr);
	}
	}
	}
	}
	```

	This code contains an implicit and subtle assumption: `ref_count` can fit in a
	`usize`, because there can't be more than `usize::MAX` Rcs in memory. However
	this itself assumes that the `ref_count` accurately reflects the number of Rcs
	in memory, which we know is false with `mem::forget`. Using `mem::forget` we can
	overflow the `ref_count`, and then get it down to 0 with outstanding Rcs. Then
	we can happily use-after-free the inner data. Bad Bad Not Good.

	This can be solved by just checking the `ref_count` and doing something. The
	standard library's stance is to just abort, because your program has become
	horribly degenerate. Also oh my gosh it's such a ridiculous corner case.

	## thread::scoped::JoinGuard

	> Note: This API has already been removed from std, for more information
	> you may refer [issue #24292](https://github.com/rust-lang/rust/issues/24292).
	>
	> This section remains here because we think this example is still
	> important, regardless of whether it is part of std or not.

	The thread::scoped API intended to allow threads to be spawned that reference
	data on their parent's stack without any synchronization over that data by
	ensuring the parent joins the thread before any of the shared data goes out
	of scope.

	<!-- ignore: simplified code -->
	```rust,ignore
	pub fn scoped<'a, F>(f: F) -> JoinGuard<'a>
	where F: FnOnce() + Send + 'a
	```

	Here `f` is some closure for the other thread to execute. Saying that
	`F: Send + 'a` is saying that it closes over data that lives for `'a`, and it
	either owns that data or the data was Sync (implying `&data` is Send).

	Because JoinGuard has a lifetime, it keeps all the data it closes over
	borrowed in the parent thread. This means the JoinGuard can't outlive
	the data that the other thread is working on. When the JoinGuard does get
	dropped it blocks the parent thread, ensuring the child terminates before any
	of the closed-over data goes out of scope in the parent.

	Usage looked like:

	<!-- ignore: simplified code -->
	```rust,ignore
	let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
	{
	let mut guards = vec![];
	for x in &mut data {
	// Move the mutable reference into the closure, and execute
	// it on a different thread. The closure has a lifetime bound
	// by the lifetime of the mutable reference `x` we store in it.
	// The guard that is returned is in turn assigned the lifetime
	// of the closure, so it also mutably borrows `data` as `x` did.
	// This means we cannot access `data` until the guard goes away.
	let guard = thread::scoped(move \|\| {
	x = 2;
	});
	// store the thread's guard for later
	guards.push(guard);
	}
	// All guards are dropped here, forcing the threads to join
	// (this thread blocks here until the others terminate).
	// Once the threads join, the borrow expires and the data becomes
	// accessible again in this thread.
	}
	// data is definitely mutated here.
	```

	In principle, this totally works! Rust's ownership system perfectly ensures it!
	...except it relies on a destructor being called to be safe.

	<!-- ignore: simplified code -->
	```rust,ignore
	let mut data = Box::new(0);
	{
	let guard = thread::scoped(\|\| {
	// This is at best a data race. At worst, it's also a use-after-free.
	*data += 1;
	});
	// Because the guard is forgotten, expiring the loan without blocking this
	// thread.
	mem::forget(guard);
	}
	// So the Box is dropped here while the scoped thread may or may not be trying
	// to access it.
	```

	Dang. Here the destructor running was pretty fundamental to the API, and it had
	to be scrapped in favor of a completely different design.