compiler/rustc_thread_pool/src/sleep/README.md - rust - Git at Google

 # Introduction: the sleep module

 The code in this module governs when worker threads should go to
 sleep. The system used in this code was introduced in [Rayon RFC #5].
 There is also a [video walkthrough] available. Both of those may be
 valuable resources to understanding the code, though naturally they
 will also grow stale over time. The comments in this file are
 extracted from the RFC and meant to be kept up to date.

 [Rayon RFC #5]: https://github.com/rayon-rs/rfcs/pull/5
 [video walkthrough]: https://youtu.be/HvmQsE5M4cY

 # The `Sleep` struct

 The `Sleep` struct is embedded into each registry. It performs several functions:

 * It tracks when workers are awake or asleep.
 * It decides how long a worker should look for work before it goes to sleep,
   via a callback that is invoked periodically from the worker's search loop.
 * It is notified when latches are set, jobs are published, or other
   events occur, and it will go and wake the appropriate threads if
   they are sleeping.

 # Thread states

 There are three main thread states:

 * An **active** thread is one that is actively executing a job.
 * An **idle** thread is one that is searching for work to do. It will be
   trying to steal work or pop work from the global injector queue.
 * A **sleeping** thread is one that is blocked on a condition variable,
   waiting to be awoken.

 We sometimes refer to the final two states collectively as **inactive**.
 Threads begin as idle but transition to idle and finally sleeping when
 they're unable to find work to do.

 ## Sleepy threads

 There is one other special state worth mentioning. During the idle state,
 threads can get **sleepy**. A sleepy thread is still idle, in that it is still
 searching for work, but it is *about* to go to sleep after it does one more
 search (or some other number, potentially). When a thread enters the sleepy
 state, it signals (via the **jobs event counter**, described below) that it is
 about to go to sleep. If new work is published, this will lead to the counter
 being adjusted. When the thread actually goes to sleep, it will (hopefully, but
 not guaranteed) see that the counter has changed and elect not to sleep, but
 instead to search again. See the section on the **jobs event counter** for more
 details.

 # The counters

 One of the key structs in the sleep module is `AtomicCounters`, found in
 `counters.rs`. It packs three counters into one atomically managed value:

 * Two **thread counters**, which track the number of threads in a particular state.
 * The **jobs event counter**, which is used to signal when new work is available.
   It (sort of) tracks the number of jobs posted, but not quite, and it can rollover.

 ## Thread counters

 There are two thread counters, one that tracks **inactive** threads and one that
 tracks **sleeping** threads. From this, one can deduce the number of threads
 that are idle by subtracting sleeping threads from inactive threads. We track
 the counters in this way because it permits simpler atomic operations. One can
 increment the number of sleeping threads (and thus decrease the number of idle
 threads) simply by doing one atomic increment, for example. Similarly, one can
 decrease the number of sleeping threads (and increase the number of idle
 threads) through one atomic decrement.

 These counters are adjusted as follows:

 * When a thread enters the idle state: increment the inactive thread counter.
 * When a thread enters the sleeping state: increment the sleeping thread counter.
 * When a thread awakens a sleeping thread: decrement the sleeping thread counter.
   * Subtle point: the thread that *awakens* the sleeping thread decrements the
     counter, not the thread that is *sleeping*. This is because there is a delay
     between signaling a thread to wake and the thread actually waking:
     decrementing the counter when awakening the thread means that other threads
     that may be posting work will see the up-to-date value that much faster.
 * When a thread finds work, exiting the idle state: decrement the inactive
   thread counter.

 ## Jobs event counter

 The final counter is the **jobs event counter**. The role of this counter is to
 help sleepy threads detect when new work is posted in a lightweight fashion. In
 its simplest form, we would simply have a counter that gets incremented each
 time a new job is posted. This way, when a thread gets sleepy, it could read the
 counter, and then compare to see if the value has changed before it actually
 goes to sleep. But this [turns out to be too expensive] in practice, so we use a
 somewhat more complex scheme.

 [turns out to be too expensive]: https://github.com/rayon-rs/rayon/pull/746#issuecomment-624802747

 The idea is that the counter toggles between two states, depending on whether
 its value is even or odd (or, equivalently, on the value of its low bit):

 * Even -- If the low bit is zero, then it means that there has been no new work
   since the last thread got sleepy.
 * Odd -- If the low bit is one, then it means that new work was posted since
   the last thread got sleepy.

 ### New work is posted

 When new work is posted, we check the value of the counter: if it is even,
 then we increment it by one, so that it becomes odd.

 ### Worker thread gets sleepy

 When a worker thread gets sleepy, it will read the value of the counter. If the
 counter is odd, it will increment the counter so that it is even. Either way, it
 remembers the final value of the counter. The final value will be used later,
 when the thread is going to sleep. If at that time the counter has not changed,
 then we can assume no new jobs have been posted (though note the remote
 possibility of rollover, discussed in detail below).

 # Protocol for a worker thread to post work

 The full protocol for a thread to post work is as follows

 * If the work is posted into the injection queue, then execute a seq-cst fence (see below).
 * Load the counters, incrementing the JEC if it is even so that it is odd.
 * Check if there are idle threads available to handle this new job. If not,
   and there are sleeping threads, then wake one or more threads.

 # Protocol for a worker thread to fall asleep

 The full protocol for a thread to fall asleep is as follows:

 * After completing all its jobs, the worker goes idle and begins to
   search for work. As it searches, it counts "rounds". In each round,
   it searches all other work threads' queues, plus the 'injector queue' for
   work injected from the outside. If work is found in this search, the thread
   becomes active again and hence restarts this protocol from the top.
 * After a certain number of rounds, the thread "gets sleepy" and executes `get_sleepy`
   above, remembering the `final_value` of the JEC. It does one more search for work.
 * If no work is found, the thread atomically:
   * Checks the JEC to see that it has not changed from `final_value`.
     * If it has, then the thread goes back to searching for work. We reset to
       just before we got sleepy, so that we will do one more search
       before attempting to sleep again (rather than searching for many rounds).
   * Increments the number of sleeping threads by 1.
 * The thread then executes a seq-cst fence operation (see below).
 * The thread then does one final check for injected jobs (see below). If any
   are available, it returns to the 'pre-sleepy' state as if the JEC had changed.
 * The thread waits to be signaled. Once signaled, it returns to the idle state.

 # The jobs event counter and deadlock

 As described in the section on the JEC, the main concern around going to sleep
 is avoiding a race condition wherein:

 * Thread A looks for work, finds none.
 * Thread B posts work but sees no sleeping threads.
 * Thread A goes to sleep.

 The JEC protocol largely prevents this, but due to rollover, this prevention is
 not complete. It is possible -- if unlikely -- that enough activity occurs for
 Thread A to observe the same JEC value that it saw when getting sleepy. If the
 new work being published came from *inside* the thread-pool, then this race
 condition isn't too harmful. It means that we have fewer workers processing the
 work then we should, but we won't deadlock. This seems like an acceptable risk
 given that this is unlikely in practice.

 However, if the work was posted as an *external* job, that is a problem. In that
 case, it's possible that all of our workers could go to sleep, and the external
 job would never get processed. To prevent that, the sleeping protocol includes
 one final check to see if the injector queue is empty before fully falling
 asleep. Note that this final check occurs **after** the number of sleeping
 threads has been incremented. We are not concerned therefore with races against
 injections that occur after that increment, only before.

 Unfortunately, there is one rather subtle point concerning this final check:
 we wish to avoid the possibility that:

 * work is pushed into the injection queue by an outside thread X,
 * the sleepy thread S sees the JEC but it has rolled over and is equal
 * the sleepy thread S reads the injection queue but does not see the work posted by X.

 This is possible because the C++ memory model typically offers guarantees of the
 form "if you see the access A, then you must see those other accesses" -- but it
 doesn't guarantee that you will see the access A (i.e., if you think of
 processors with independent caches, you may be operating on very out of date
 cache state).

 ## Using seq-cst fences to prevent deadlock

 To overcome this problem, we have inserted two sequentially consistent fence
 operations into the protocols above:

 * One fence occurs after work is posted into the injection queue, but before the
   counters are read (including the number of sleeping threads).
   * Note that no fence is needed for work posted to internal queues, since it is ok
     to overlook work in that case.
 * One fence occurs after the number of sleeping threads is incremented, but
   before the injection queue is read.

 ### Proof sketch

 What follows is a "proof sketch" that the protocol is deadlock free. We model
 two relevant bits of memory, the job injector queue J and the atomic counters C.

 Consider the actions of the injecting thread:

 * PushJob: Job is injected, which can be modeled as an atomic write to J with release semantics.
 * PushFence: A sequentially consistent fence is executed.
 * ReadSleepers: The counters C are read (they may also be incremented, but we just consider the read that comes first).

 Meanwhile, the sleepy thread does the following:

 * IncSleepers: The number of sleeping threads is incremented, which is atomic exchange to C.
 * SleepFence: A sequentially consistent fence is executed.
 * ReadJob: We look to see if the queue is empty, which is a read of J with acquire semantics.

 Either PushFence or SleepFence must come first:

 * If PushFence comes first, then PushJob must be visible to ReadJob.
 * If SleepFence comes first, then IncSleepers is visible to ReadSleepers.

 # Deadlock detection

 This module tracks a number of variables in order to detect deadlocks due to user code blocking.
 These variables are stored in the `SleepData` struct which itself is kept behind a mutex.
 It contains the following fields:
 - `worker_count` - The number of threads in the thread pool.
 - `active_threads` - The number of threads in the thread pool which are running
                      and aren't blocked in user code or sleeping.
 - `blocked_threads` - The number of threads which are blocked in user code.
                       This doesn't include threads blocked by Rayon.

 User code can indicate blocking by calling `mark_blocked` before blocking and
 calling `mark_unblocked` before unblocking a thread.
 This will adjust `active_threads` and `blocked_threads` accordingly.

 When we tickle the thread pool in `Sleep::tickle_cold`, we set `active_threads` to
 `worker_count` - `blocked_threads` since we wake up all Rayon threads, but not thread blocked
 by user code.

 A deadlock is detected by checking if `active_threads` is 0 and `blocked_threads` is above 0.
 If we ignored `blocked_threads` we would have a deadlock
 immediately when creating the thread pool.
 We would also deadlock once the thread pool ran out of work.
 It is not possible for Rayon itself to deadlock.
 Deadlocks can only be caused by user code blocking, so this condition doesn't miss any deadlocks.

 We check for the deadlock condition when
 threads fall asleep in `mark_unblocked` and in `Sleep::sleep`.
 If there's a deadlock detected we call the user provided deadlock handler while we hold the
 lock to `SleepData`. This means the deadlock handler cannot call `mark_blocked` and
 `mark_unblocked`. The user is expected to handle the deadlock in some non-Rayon thread.
 Once the deadlock handler returns, the thread which called the deadlock handler will go to sleep.
	# Introduction: the sleep module

	The code in this module governs when worker threads should go to
	sleep. The system used in this code was introduced in [Rayon RFC #5].
	There is also a [video walkthrough] available. Both of those may be
	valuable resources to understanding the code, though naturally they
	will also grow stale over time. The comments in this file are
	extracted from the RFC and meant to be kept up to date.

	[Rayon RFC #5]: https://github.com/rayon-rs/rfcs/pull/5
	[video walkthrough]: https://youtu.be/HvmQsE5M4cY

	# The `Sleep` struct

	The `Sleep` struct is embedded into each registry. It performs several functions:

	* It tracks when workers are awake or asleep.
	* It decides how long a worker should look for work before it goes to sleep,
	via a callback that is invoked periodically from the worker's search loop.
	* It is notified when latches are set, jobs are published, or other
	events occur, and it will go and wake the appropriate threads if
	they are sleeping.

	# Thread states

	There are three main thread states:

	* An active thread is one that is actively executing a job.
	* An idle thread is one that is searching for work to do. It will be
	trying to steal work or pop work from the global injector queue.
	* A sleeping thread is one that is blocked on a condition variable,
	waiting to be awoken.

	We sometimes refer to the final two states collectively as inactive.
	Threads begin as idle but transition to idle and finally sleeping when
	they're unable to find work to do.

	## Sleepy threads

	There is one other special state worth mentioning. During the idle state,
	threads can get sleepy. A sleepy thread is still idle, in that it is still
	searching for work, but it is about to go to sleep after it does one more
	search (or some other number, potentially). When a thread enters the sleepy
	state, it signals (via the jobs event counter, described below) that it is
	about to go to sleep. If new work is published, this will lead to the counter
	being adjusted. When the thread actually goes to sleep, it will (hopefully, but
	not guaranteed) see that the counter has changed and elect not to sleep, but
	instead to search again. See the section on the jobs event counter for more
	details.

	# The counters

	One of the key structs in the sleep module is `AtomicCounters`, found in
	`counters.rs`. It packs three counters into one atomically managed value:

	* Two thread counters, which track the number of threads in a particular state.
	* The jobs event counter, which is used to signal when new work is available.
	It (sort of) tracks the number of jobs posted, but not quite, and it can rollover.

	## Thread counters

	There are two thread counters, one that tracks inactive threads and one that
	tracks sleeping threads. From this, one can deduce the number of threads
	that are idle by subtracting sleeping threads from inactive threads. We track
	the counters in this way because it permits simpler atomic operations. One can
	increment the number of sleeping threads (and thus decrease the number of idle
	threads) simply by doing one atomic increment, for example. Similarly, one can
	decrease the number of sleeping threads (and increase the number of idle
	threads) through one atomic decrement.

	These counters are adjusted as follows:

	* When a thread enters the idle state: increment the inactive thread counter.
	* When a thread enters the sleeping state: increment the sleeping thread counter.
	* When a thread awakens a sleeping thread: decrement the sleeping thread counter.
	* Subtle point: the thread that awakens the sleeping thread decrements the
	counter, not the thread that is sleeping. This is because there is a delay
	between signaling a thread to wake and the thread actually waking:
	decrementing the counter when awakening the thread means that other threads
	that may be posting work will see the up-to-date value that much faster.
	* When a thread finds work, exiting the idle state: decrement the inactive
	thread counter.

	## Jobs event counter

	The final counter is the jobs event counter. The role of this counter is to
	help sleepy threads detect when new work is posted in a lightweight fashion. In
	its simplest form, we would simply have a counter that gets incremented each
	time a new job is posted. This way, when a thread gets sleepy, it could read the
	counter, and then compare to see if the value has changed before it actually
	goes to sleep. But this [turns out to be too expensive] in practice, so we use a
	somewhat more complex scheme.

	[turns out to be too expensive]: https://github.com/rayon-rs/rayon/pull/746#issuecomment-624802747

	The idea is that the counter toggles between two states, depending on whether
	its value is even or odd (or, equivalently, on the value of its low bit):

	* Even -- If the low bit is zero, then it means that there has been no new work
	since the last thread got sleepy.
	* Odd -- If the low bit is one, then it means that new work was posted since
	the last thread got sleepy.

	### New work is posted

	When new work is posted, we check the value of the counter: if it is even,
	then we increment it by one, so that it becomes odd.

	### Worker thread gets sleepy

	When a worker thread gets sleepy, it will read the value of the counter. If the
	counter is odd, it will increment the counter so that it is even. Either way, it
	remembers the final value of the counter. The final value will be used later,
	when the thread is going to sleep. If at that time the counter has not changed,
	then we can assume no new jobs have been posted (though note the remote
	possibility of rollover, discussed in detail below).

	# Protocol for a worker thread to post work

	The full protocol for a thread to post work is as follows

	* If the work is posted into the injection queue, then execute a seq-cst fence (see below).
	* Load the counters, incrementing the JEC if it is even so that it is odd.
	* Check if there are idle threads available to handle this new job. If not,
	and there are sleeping threads, then wake one or more threads.

	# Protocol for a worker thread to fall asleep

	The full protocol for a thread to fall asleep is as follows:

	* After completing all its jobs, the worker goes idle and begins to
	search for work. As it searches, it counts "rounds". In each round,
	it searches all other work threads' queues, plus the 'injector queue' for
	work injected from the outside. If work is found in this search, the thread
	becomes active again and hence restarts this protocol from the top.
	* After a certain number of rounds, the thread "gets sleepy" and executes `get_sleepy`
	above, remembering the `final_value` of the JEC. It does one more search for work.
	* If no work is found, the thread atomically:
	* Checks the JEC to see that it has not changed from `final_value`.
	* If it has, then the thread goes back to searching for work. We reset to
	just before we got sleepy, so that we will do one more search
	before attempting to sleep again (rather than searching for many rounds).
	* Increments the number of sleeping threads by 1.
	* The thread then executes a seq-cst fence operation (see below).
	* The thread then does one final check for injected jobs (see below). If any
	are available, it returns to the 'pre-sleepy' state as if the JEC had changed.
	* The thread waits to be signaled. Once signaled, it returns to the idle state.

	# The jobs event counter and deadlock

	As described in the section on the JEC, the main concern around going to sleep
	is avoiding a race condition wherein:

	* Thread A looks for work, finds none.
	* Thread B posts work but sees no sleeping threads.
	* Thread A goes to sleep.

	The JEC protocol largely prevents this, but due to rollover, this prevention is
	not complete. It is possible -- if unlikely -- that enough activity occurs for
	Thread A to observe the same JEC value that it saw when getting sleepy. If the
	new work being published came from inside the thread-pool, then this race
	condition isn't too harmful. It means that we have fewer workers processing the
	work then we should, but we won't deadlock. This seems like an acceptable risk
	given that this is unlikely in practice.

	However, if the work was posted as an external job, that is a problem. In that
	case, it's possible that all of our workers could go to sleep, and the external
	job would never get processed. To prevent that, the sleeping protocol includes
	one final check to see if the injector queue is empty before fully falling
	asleep. Note that this final check occurs after the number of sleeping
	threads has been incremented. We are not concerned therefore with races against
	injections that occur after that increment, only before.

	Unfortunately, there is one rather subtle point concerning this final check:
	we wish to avoid the possibility that:

	* work is pushed into the injection queue by an outside thread X,
	* the sleepy thread S sees the JEC but it has rolled over and is equal
	* the sleepy thread S reads the injection queue but does not see the work posted by X.

	This is possible because the C++ memory model typically offers guarantees of the
	form "if you see the access A, then you must see those other accesses" -- but it
	doesn't guarantee that you will see the access A (i.e., if you think of
	processors with independent caches, you may be operating on very out of date
	cache state).

	## Using seq-cst fences to prevent deadlock

	To overcome this problem, we have inserted two sequentially consistent fence
	operations into the protocols above:

	* One fence occurs after work is posted into the injection queue, but before the
	counters are read (including the number of sleeping threads).
	* Note that no fence is needed for work posted to internal queues, since it is ok
	to overlook work in that case.
	* One fence occurs after the number of sleeping threads is incremented, but
	before the injection queue is read.

	### Proof sketch

	What follows is a "proof sketch" that the protocol is deadlock free. We model
	two relevant bits of memory, the job injector queue J and the atomic counters C.

	Consider the actions of the injecting thread:

	* PushJob: Job is injected, which can be modeled as an atomic write to J with release semantics.
	* PushFence: A sequentially consistent fence is executed.
	* ReadSleepers: The counters C are read (they may also be incremented, but we just consider the read that comes first).

	Meanwhile, the sleepy thread does the following:

	* IncSleepers: The number of sleeping threads is incremented, which is atomic exchange to C.
	* SleepFence: A sequentially consistent fence is executed.
	* ReadJob: We look to see if the queue is empty, which is a read of J with acquire semantics.

	Either PushFence or SleepFence must come first:

	* If PushFence comes first, then PushJob must be visible to ReadJob.
	* If SleepFence comes first, then IncSleepers is visible to ReadSleepers.

	# Deadlock detection

	This module tracks a number of variables in order to detect deadlocks due to user code blocking.
	These variables are stored in the `SleepData` struct which itself is kept behind a mutex.
	It contains the following fields:
	- `worker_count` - The number of threads in the thread pool.
	- `active_threads` - The number of threads in the thread pool which are running
	and aren't blocked in user code or sleeping.
	- `blocked_threads` - The number of threads which are blocked in user code.
	This doesn't include threads blocked by Rayon.

	User code can indicate blocking by calling `mark_blocked` before blocking and
	calling `mark_unblocked` before unblocking a thread.
	This will adjust `active_threads` and `blocked_threads` accordingly.

	When we tickle the thread pool in `Sleep::tickle_cold`, we set `active_threads` to
	`worker_count` - `blocked_threads` since we wake up all Rayon threads, but not thread blocked
	by user code.

	A deadlock is detected by checking if `active_threads` is 0 and `blocked_threads` is above 0.
	If we ignored `blocked_threads` we would have a deadlock
	immediately when creating the thread pool.
	We would also deadlock once the thread pool ran out of work.
	It is not possible for Rayon itself to deadlock.
	Deadlocks can only be caused by user code blocking, so this condition doesn't miss any deadlocks.

	We check for the deadlock condition when
	threads fall asleep in `mark_unblocked` and in `Sleep::sleep`.
	If there's a deadlock detected we call the user provided deadlock handler while we hold the
	lock to `SleepData`. This means the deadlock handler cannot call `mark_blocked` and
	`mark_unblocked`. The user is expected to handle the deadlock in some non-Rayon thread.
	Once the deadlock handler returns, the thread which called the deadlock handler will go to sleep.