src/ch13-04-performance.md - rust-lang/book - Git at Google

 ## Comparing Performance: Loops vs. Iterators

 To determine whether to use loops or iterators, you need to know which
 implementation is faster: the version of the `search` function with an explicit
 `for` loop or the version with iterators.

 We ran a benchmark by loading the entire contents of _The Adventures of Sherlock
 Holmes_ by Sir Arthur Conan Doyle into a `String` and looking for the word _the_
 in the contents. Here are the results of the benchmark on the version of
 `search` using the `for` loop and the version using iterators:

 ```text
 test bench_search_for  ... bench:  19,620,300 ns/iter (+/- 915,700)
 test bench_search_iter ... bench:  19,234,900 ns/iter (+/- 657,200)
 ```

 The iterator version was slightly faster! We won’t explain the benchmark code
 here, because the point is not to prove that the two versions are equivalent but
 to get a general sense of how these two implementations compare
 performance-wise.

 For a more comprehensive benchmark, you should check using various texts of
 various sizes as the `contents`, different words and words of different lengths
 as the `query`, and all kinds of other variations. The point is this: iterators,
 although a high-level abstraction, get compiled down to roughly the same code as
 if you’d written the lower-level code yourself. Iterators are one of Rust’s
 _zero-cost abstractions_, by which we mean using the abstraction imposes no
 additional runtime overhead. This is analogous to how Bjarne Stroustrup, the
 original designer and implementor of C++, defines _zero-overhead_ in
 “Foundations of C++” (2012):

 > In general, C++ implementations obey the zero-overhead principle: What you
 > don’t use, you don’t pay for. And further: What you do use, you couldn’t hand
 > code any better.

 As another example, the following code is taken from an audio decoder. The
 decoding algorithm uses the linear prediction mathematical operation to estimate
 future values based on a linear function of the previous samples. This code uses
 an iterator chain to do some math on three variables in scope: a `buffer` slice
 of data, an array of 12 `coefficients`, and an amount by which to shift data in
 `qlp_shift`. We’ve declared the variables within this example but not given them
 any values; although this code doesn’t have much meaning outside of its context,
 it’s still a concise, real-world example of how Rust translates high-level ideas
 to low-level code.

 ```rust,ignore
 let buffer: &mut [i32];
 let coefficients: [i64; 12];
 let qlp_shift: i16;

 for i in 12..buffer.len() {
     let prediction = coefficients.iter()
                                  .zip(&buffer[i - 12..i])
                                  .map(|(&c, &s)| c * s as i64)
                                  .sum::<i64>() >> qlp_shift;
     let delta = buffer[i];
     buffer[i] = prediction as i32 + delta;
 }
 ```

 To calculate the value of `prediction`, this code iterates through each of the
 12 values in `coefficients` and uses the `zip` method to pair the coefficient
 values with the previous 12 values in `buffer`. Then, for each pair, we multiply
 the values together, sum all the results, and shift the bits in the sum
 `qlp_shift` bits to the right.

 Calculations in applications like audio decoders often prioritize performance
 most highly. Here, we’re creating an iterator, using two adapters, and then
 consuming the value. What assembly code would this Rust code compile to? Well,
 as of this writing, it compiles down to the same assembly you’d write by hand.
 There’s no loop at all corresponding to the iteration over the values in
 `coefficients`: Rust knows that there are 12 iterations, so it “unrolls” the
 loop. _Unrolling_ is an optimization that removes the overhead of the loop
 controlling code and instead generates repetitive code for each iteration of the
 loop.

 All of the coefficients get stored in registers, which means accessing the
 values is very fast. There are no bounds checks on the array access at runtime.
 All these optimizations that Rust is able to apply make the resulting code
 extremely efficient. Now that you know this, you can use iterators and closures
 without fear! They make code seem like it’s higher level but don’t impose a
 runtime performance penalty for doing so.

 ## Summary

 Closures and iterators are Rust features inspired by functional programming
 language ideas. They contribute to Rust’s capability to clearly express
 high-level ideas at low-level performance. The implementations of closures and
 iterators are such that runtime performance is not affected. This is part of
 Rust’s goal to strive to provide zero-cost abstractions.

 Now that we’ve improved the expressiveness of our I/O project, let’s look at
 some more features of `cargo` that will help us share the project with the
 world.
	## Comparing Performance: Loops vs. Iterators

	To determine whether to use loops or iterators, you need to know which
	implementation is faster: the version of the `search` function with an explicit
	`for` loop or the version with iterators.

	We ran a benchmark by loading the entire contents of _The Adventures of Sherlock
	Holmes_ by Sir Arthur Conan Doyle into a `String` and looking for the word _the_
	in the contents. Here are the results of the benchmark on the version of
	`search` using the `for` loop and the version using iterators:

	```text
	test bench_search_for ... bench: 19,620,300 ns/iter (+/- 915,700)
	test bench_search_iter ... bench: 19,234,900 ns/iter (+/- 657,200)
	```

	The iterator version was slightly faster! We won’t explain the benchmark code
	here, because the point is not to prove that the two versions are equivalent but
	to get a general sense of how these two implementations compare
	performance-wise.

	For a more comprehensive benchmark, you should check using various texts of
	various sizes as the `contents`, different words and words of different lengths
	as the `query`, and all kinds of other variations. The point is this: iterators,
	although a high-level abstraction, get compiled down to roughly the same code as
	if you’d written the lower-level code yourself. Iterators are one of Rust’s
	_zero-cost abstractions_, by which we mean using the abstraction imposes no
	additional runtime overhead. This is analogous to how Bjarne Stroustrup, the
	original designer and implementor of C++, defines _zero-overhead_ in
	“Foundations of C++” (2012):

	> In general, C++ implementations obey the zero-overhead principle: What you
	> don’t use, you don’t pay for. And further: What you do use, you couldn’t hand
	> code any better.

	As another example, the following code is taken from an audio decoder. The
	decoding algorithm uses the linear prediction mathematical operation to estimate
	future values based on a linear function of the previous samples. This code uses
	an iterator chain to do some math on three variables in scope: a `buffer` slice
	of data, an array of 12 `coefficients`, and an amount by which to shift data in
	`qlp_shift`. We’ve declared the variables within this example but not given them
	any values; although this code doesn’t have much meaning outside of its context,
	it’s still a concise, real-world example of how Rust translates high-level ideas
	to low-level code.

	```rust,ignore
	let buffer: &mut [i32];
	let coefficients: [i64; 12];
	let qlp_shift: i16;

	for i in 12..buffer.len() {
	let prediction = coefficients.iter()
	.zip(&buffer[i - 12..i])
	.map(\|(&c, &s)\| c * s as i64)
	.sum::<i64>() >> qlp_shift;
	let delta = buffer[i];
	buffer[i] = prediction as i32 + delta;
	}
	```

	To calculate the value of `prediction`, this code iterates through each of the
	12 values in `coefficients` and uses the `zip` method to pair the coefficient
	values with the previous 12 values in `buffer`. Then, for each pair, we multiply
	the values together, sum all the results, and shift the bits in the sum
	`qlp_shift` bits to the right.

	Calculations in applications like audio decoders often prioritize performance
	most highly. Here, we’re creating an iterator, using two adapters, and then
	consuming the value. What assembly code would this Rust code compile to? Well,
	as of this writing, it compiles down to the same assembly you’d write by hand.
	There’s no loop at all corresponding to the iteration over the values in
	`coefficients`: Rust knows that there are 12 iterations, so it “unrolls” the
	loop. _Unrolling_ is an optimization that removes the overhead of the loop
	controlling code and instead generates repetitive code for each iteration of the
	loop.

	All of the coefficients get stored in registers, which means accessing the
	values is very fast. There are no bounds checks on the array access at runtime.
	All these optimizations that Rust is able to apply make the resulting code
	extremely efficient. Now that you know this, you can use iterators and closures
	without fear! They make code seem like it’s higher level but don’t impose a
	runtime performance penalty for doing so.

	## Summary

	Closures and iterators are Rust features inspired by functional programming
	language ideas. They contribute to Rust’s capability to clearly express
	high-level ideas at low-level performance. The implementations of closures and
	iterators are such that runtime performance is not affected. This is part of
	Rust’s goal to strive to provide zero-cost abstractions.

	Now that we’ve improved the expressiveness of our I/O project, let’s look at
	some more features of `cargo` that will help us share the project with the
	world.