src/ch21-01-single-threaded.md - rust-lang/book - Git at Google

 ## Building a Single-Threaded Web Server

 We’ll start by getting a single-threaded web server working. Before we begin,
 let’s look at a quick overview of the protocols involved in building web
 servers. The details of these protocols are beyond the scope of this book, but a
 brief overview will give you the information you need.

 The two main protocols involved in web servers are _Hypertext Transfer Protocol_
 _(HTTP)_ and _Transmission Control Protocol_ _(TCP)_. Both protocols are
 _request-response_ protocols, meaning a _client_ initiates requests and a
 _server_ listens to the requests and provides a response to the client. The
 contents of those requests and responses are defined by the protocols.

 TCP is the lower-level protocol that describes the details of how information
 gets from one server to another but doesn’t specify what that information is.
 HTTP builds on top of TCP by defining the contents of the requests and
 responses. It’s technically possible to use HTTP with other protocols, but in
 the vast majority of cases, HTTP sends its data over TCP. We’ll work with the
 raw bytes of TCP and HTTP requests and responses.

 ### Listening to the TCP Connection

 Our web server needs to listen to a TCP connection, so that’s the first part
 we’ll work on. The standard library offers a `std::net` module that lets us do
 this. Let’s make a new project in the usual fashion:

 ```console
 $ cargo new hello
      Created binary (application) `hello` project
 $ cd hello
 ```

 Now enter the code in Listing 21-1 in _src/main.rs_ to start. This code will
 listen at the local address `127.0.0.1:7878` for incoming TCP streams. When it
 gets an incoming stream, it will print `Connection established!`.

 <Listing number="21-1" file-name="src/main.rs" caption="Listening for incoming streams and printing a message when we receive a stream">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-01/src/main.rs}}
 ```

 </Listing>

 Using `TcpListener`, we can listen for TCP connections at the address
 `127.0.0.1:7878`. In the address, the section before the colon is an IP address
 representing your computer (this is the same on every computer and doesn’t
 represent the authors’ computer specifically), and `7878` is the port. We’ve
 chosen this port for two reasons: HTTP isn’t normally accepted on this port so
 our server is unlikely to conflict with any other web server you might have
 running on your machine, and 7878 is _rust_ typed on a telephone.

 The `bind` function in this scenario works like the `new` function in that it
 will return a new `TcpListener` instance. The function is called `bind` because,
 in networking, connecting to a port to listen to is known as “binding to a
 port.”

 The `bind` function returns a `Result<T, E>`, which indicates that it’s possible
 for binding to fail. For example, connecting to port 80 requires administrator
 privileges (nonadministrators can listen only on ports higher than 1023), so if
 we tried to connect to port 80 without being an administrator, binding wouldn’t
 work. Binding also wouldn’t work, for example, if we ran two instances of our
 program and so had two programs listening to the same port. Because we’re
 writing a basic server just for learning purposes, we won’t worry about handling
 these kinds of errors; instead, we use `unwrap` to stop the program if errors
 happen.

 The `incoming` method on `TcpListener` returns an iterator that gives us a
 sequence of streams (more specifically, streams of type `TcpStream`). A single
 _stream_ represents an open connection between the client and the server. A
 _connection_ is the name for the full request and response process in which a
 client connects to the server, the server generates a response, and the server
 closes the connection. As such, we will read from the `TcpStream` to see what
 the client sent and then write our response to the stream to send data back to
 the client. Overall, this `for` loop will process each connection in turn and
 produce a series of streams for us to handle.

 For now, our handling of the stream consists of calling `unwrap` to terminate
 our program if the stream has any errors; if there aren’t any errors, the
 program prints a message. We’ll add more functionality for the success case in
 the next listing. The reason we might receive errors from the `incoming` method
 when a client connects to the server is that we’re not actually iterating over
 connections. Instead, we’re iterating over _connection attempts_. The connection
 might not be successful for a number of reasons, many of them operating system
 specific. For example, many operating systems have a limit to the number of
 simultaneous open connections they can support; new connection attempts beyond
 that number will produce an error until some of the open connections are closed.

 Let’s try running this code! Invoke `cargo run` in the terminal and then load
 _127.0.0.1:7878_ in a web browser. The browser should show an error message like
 “Connection reset,” because the server isn’t currently sending back any data.
 But when you look at your terminal, you should see several messages that were
 printed when the browser connected to the server!

 ```text
      Running `target/debug/hello`
 Connection established!
 Connection established!
 Connection established!
 ```

 Sometimes, you’ll see multiple messages printed for one browser request; the
 reason might be that the browser is making a request for the page as well as a
 request for other resources, like the _favicon.ico_ icon that appears in the
 browser tab.

 It could also be that the browser is trying to connect to the server multiple
 times because the server isn’t responding with any data. When `stream` goes out
 of scope and is dropped at the end of the loop, the connection is closed as part
 of the `drop` implementation. Browsers sometimes deal with closed connections by
 retrying, because the problem might be temporary. The important factor is that
 we’ve successfully gotten a handle to a TCP connection!

 Remember to stop the program by pressing <kbd>ctrl</kbd>-<kbd>c</kbd> when
 you’re done running a particular version of the code. Then restart the program
 by invoking the `cargo run` command after you’ve made each set of code changes
 to make sure you’re running the newest code.

 ### Reading the Request

 Let’s implement the functionality to read the request from the browser! To
 separate the concerns of first getting a connection and then taking some action
 with the connection, we’ll start a new function for processing connections. In
 this new `handle_connection` function, we’ll read data from the TCP stream and
 print it so we can see the data being sent from the browser. Change the code to
 look like Listing 21-2.

 <Listing number="21-2" file-name="src/main.rs" caption="Reading from the `TcpStream` and printing the data">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-02/src/main.rs}}
 ```

 </Listing>

 We bring `std::io::prelude` and `std::io::BufReader` into scope to get access to
 traits and types that let us read from and write to the stream. In the `for`
 loop in the `main` function, instead of printing a message that says we made a
 connection, we now call the new `handle_connection` function and pass the
 `stream` to it.

 In the `handle_connection` function, we create a new `BufReader` instance that
 wraps a reference to the `stream`. The `BufReader` adds buffering by managing
 calls to the `std::io::Read` trait methods for us.

 We create a variable named `http_request` to collect the lines of the request
 the browser sends to our server. We indicate that we want to collect these lines
 in a vector by adding the `Vec<_>` type annotation.

 `BufReader` implements the `std::io::BufRead` trait, which provides the `lines`
 method. The `lines` method returns an iterator of
 `Result<String,
 std::io::Error>` by splitting the stream of data whenever it sees
 a newline byte. To get each `String`, we map and `unwrap` each `Result`. The
 `Result` might be an error if the data isn’t valid UTF-8 or if there was a
 problem reading from the stream. Again, a production program should handle these
 errors more gracefully, but we’re choosing to stop the program in the error case
 for simplicity.

 The browser signals the end of an HTTP request by sending two newline characters
 in a row, so to get one request from the stream, we take lines until we get a
 line that is the empty string. Once we’ve collected the lines into the vector,
 we’re printing them out using pretty debug formatting so we can take a look at
 the instructions the web browser is sending to our server.

 Let’s try this code! Start the program and make a request in a web browser
 again. Note that we’ll still get an error page in the browser, but our program’s
 output in the terminal will now look similar to this:

 ```console
 $ cargo run
    Compiling hello v0.1.0 (file:///projects/hello)
     Finished dev [unoptimized + debuginfo] target(s) in 0.42s
      Running `target/debug/hello`
 Request: [
     "GET / HTTP/1.1",
     "Host: 127.0.0.1:7878",
     "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:99.0) Gecko/20100101 Firefox/99.0",
     "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
     "Accept-Language: en-US,en;q=0.5",
     "Accept-Encoding: gzip, deflate, br",
     "DNT: 1",
     "Connection: keep-alive",
     "Upgrade-Insecure-Requests: 1",
     "Sec-Fetch-Dest: document",
     "Sec-Fetch-Mode: navigate",
     "Sec-Fetch-Site: none",
     "Sec-Fetch-User: ?1",
     "Cache-Control: max-age=0",
 ]
 ```

 Depending on your browser, you might get slightly different output. Now that
 we’re printing the request data, we can see why we get multiple connections from
 one browser request by looking at the path after `GET` in the first line of the
 request. If the repeated connections are all requesting _/_, we know the browser
 is trying to fetch _/_ repeatedly because it’s not getting a response from our
 program.

 Let’s break down this request data to understand what the browser is asking of
 our program.

 ### A Closer Look at an HTTP Request

 HTTP is a text-based protocol, and a request takes this format:

 ```text
 Method Request-URI HTTP-Version CRLF
 headers CRLF
 message-body
 ```

 The first line is the _request line_ that holds information about what the
 client is requesting. The first part of the request line indicates the _method_
 being used, such as `GET` or `POST`, which describes how the client is making
 this request. Our client used a `GET` request, which means it is asking for
 information.

 The next part of the request line is _/_, which indicates the _Uniform Resource
 Identifier_ _(URI)_ the client is requesting: a URI is almost, but not quite,
 the same as a _Uniform Resource Locator_ _(URL)_. The difference between URIs
 and URLs isn’t important for our purposes in this chapter, but the HTTP spec
 uses the term URI, so we can just mentally substitute URL for URI here.

 The last part is the HTTP version the client uses, and then the request line
 ends in a _CRLF sequence_. (CRLF stands for _carriage return_ and _line feed_,
 which are terms from the typewriter days!) The CRLF sequence can also be written
 as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The CRLF
 sequence separates the request line from the rest of the request data. Note that
 when the CRLF is printed, we see a new line start rather than `\r\n`.

 Looking at the request line data we received from running our program so far, we
 see that `GET` is the method, _/_ is the request URI, and `HTTP/1.1` is the
 version.

 After the request line, the remaining lines starting from `Host:` onward are
 headers. `GET` requests have no body.

 Try making a request from a different browser or asking for a different address,
 such as _127.0.0.1:7878/test_, to see how the request data changes.

 Now that we know what the browser is asking for, let’s send back some data!

 ### Writing a Response

 We’re going to implement sending data in response to a client request. Responses
 have the following format:

 ```text
 HTTP-Version Status-Code Reason-Phrase CRLF
 headers CRLF
 message-body
 ```

 The first line is a _status line_ that contains the HTTP version used in the
 response, a numeric status code that summarizes the result of the request, and a
 reason phrase that provides a text description of the status code. After the
 CRLF sequence are any headers, another CRLF sequence, and the body of the
 response.

 Here is an example response that uses HTTP version 1.1, has a status code of
 200, an OK reason phrase, no headers, and no body:

 ```text
 HTTP/1.1 200 OK\r\n\r\n
 ```

 The status code 200 is the standard success response. The text is a tiny
 successful HTTP response. Let’s write this to the stream as our response to a
 successful request! From the `handle_connection` function, remove the `println!`
 that was printing the request data and replace it with the code in Listing 21-3.

 <Listing number="21-3" file-name="src/main.rs" caption="Writing a tiny successful HTTP response to the stream">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-03/src/main.rs:here}}
 ```

 </Listing>

 The first new line defines the `response` variable that holds the success
 message’s data. Then we call `as_bytes` on our `response` to convert the string
 data to bytes. The `write_all` method on `stream` takes a `&[u8]` and sends
 those bytes directly down the connection. Because the `write_all` operation
 could fail, we use `unwrap` on any error result as before. Again, in a real
 application you would add error handling here.

 With these changes, let’s run our code and make a request. We’re no longer
 printing any data to the terminal, so we won’t see any output other than the
 output from Cargo. When you load _127.0.0.1:7878_ in a web browser, you should
 get a blank page instead of an error. You’ve just hand-coded receiving an HTTP
 request and sending a response!

 ### Returning Real HTML

 Let’s implement the functionality for returning more than a blank page. Create
 the new file _hello.html_ in the root of your project directory, not in the
 _src_ directory. You can input any HTML you want; Listing 21-4 shows one
 possibility.

 <Listing number="21-4" file-name="hello.html" caption="A sample HTML file to return in a response">

 ```html
 {{#include ../listings/ch21-web-server/listing-21-05/hello.html}}
 ```

 </Listing>

 This is a minimal HTML5 document with a heading and some text. To return this
 from the server when a request is received, we’ll modify `handle_connection` as
 shown in Listing 21-5 to read the HTML file, add it to the response as a body,
 and send it.

 <Listing number="21-5" file-name="src/main.rs" caption="Sending the contents of *hello.html* as the body of the response">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-05/src/main.rs:here}}
 ```

 </Listing>

 We’ve added `fs` to the `use` statement to bring the standard library’s
 filesystem module into scope. The code for reading the contents of a file to a
 string should look familiar; we used it in Chapter 12 when we read the contents
 of a file for our I/O project in Listing 12-4.

 Next, we use `format!` to add the file’s contents as the body of the success
 response. To ensure a valid HTTP response, we add the `Content-Length` header
 which is set to the size of our response body, in this case the size of
 `hello.html`.

 Run this code with `cargo run` and load _127.0.0.1:7878_ in your browser; you
 should see your HTML rendered!

 Currently, we’re ignoring the request data in `http_request` and just sending
 back the contents of the HTML file unconditionally. That means if you try
 requesting _127.0.0.1:7878/something-else_ in your browser, you’ll still get
 back this same HTML response. At the moment, our server is very limited and does
 not do what most web servers do. We want to customize our responses depending on
 the request and only send back the HTML file for a well-formed request to _/_.

 ### Validating the Request and Selectively Responding

 Right now, our web server will return the HTML in the file no matter what the
 client requested. Let’s add functionality to check that the browser is
 requesting _/_ before returning the HTML file and return an error if the browser
 requests anything else. For this we need to modify `handle_connection`, as shown
 in Listing 21-6. This new code checks the content of the request received
 against what we know a request for _/_ looks like and adds `if` and `else`
 blocks to treat requests differently.

 <Listing number="21-6" file-name="src/main.rs" caption="Handling requests to */* differently from other requests">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-06/src/main.rs:here}}
 ```

 </Listing>

 We’re only going to be looking at the first line of the HTTP request, so rather
 than reading the entire request into a vector, we’re calling `next` to get the
 first item from the iterator. The first `unwrap` takes care of the `Option` and
 stops the program if the iterator has no items. The second `unwrap` handles the
 `Result` and has the same effect as the `unwrap` that was in the `map` added in
 Listing 21-2.

 Next, we check the `request_line` to see if it equals the request line of a GET
 request to the _/_ path. If it does, the `if` block returns the contents of our
 HTML file.

 If the `request_line` does _not_ equal the GET request to the _/_ path, it means
 we’ve received some other request. We’ll add code to the `else` block in a
 moment to respond to all other requests.

 Run this code now and request _127.0.0.1:7878_; you should get the HTML in
 _hello.html_. If you make any other request, such as
 _127.0.0.1:7878/something-else_, you’ll get a connection error like those you
 saw when running the code in Listing 21-1 and Listing 21-2.

 Now let’s add the code in Listing 21-7 to the `else` block to return a response
 with the status code 404, which signals that the content for the request was not
 found. We’ll also return some HTML for a page to render in the browser
 indicating the response to the end user.

 <Listing number="21-7" file-name="src/main.rs" caption="Responding with status code 404 and an error page if anything other than */* was requested">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-07/src/main.rs:here}}
 ```

 </Listing>

 Here, our response has a status line with status code 404 and the reason phrase
 `NOT FOUND`. The body of the response will be the HTML in the file _404.html_.
 You’ll need to create a _404.html_ file next to _hello.html_ for the error page;
 again feel free to use any HTML you want or use the example HTML in Listing
 21-8.

 <Listing number="21-8" file-name="404.html" caption="Sample content for the page to send back with any 404 response">

 ```html
 {{#include ../listings/ch21-web-server/listing-21-07/404.html}}
 ```

 </Listing>

 With these changes, run your server again. Requesting _127.0.0.1:7878_ should
 return the contents of _hello.html_, and any other request, like
 _127.0.0.1:7878/foo_, should return the error HTML from _404.html_.

 ### A Touch of Refactoring

 At the moment the `if` and `else` blocks have a lot of repetition: they’re both
 reading files and writing the contents of the files to the stream. The only
 differences are the status line and the filename. Let’s make the code more
 concise by pulling out those differences into separate `if` and `else` lines
 that will assign the values of the status line and the filename to variables; we
 can then use those variables unconditionally in the code to read the file and
 write the response. Listing 21-9 shows the resulting code after replacing the
 large `if` and `else` blocks.

 <Listing number="21-9" file-name="src/main.rs" caption="Refactoring the `if` and `else` blocks to contain only the code that differs between the two cases">

 ```rust,no_run
 {{#rustdoc_include ../listings/ch21-web-server/listing-21-09/src/main.rs:here}}
 ```

 </Listing>

 Now the `if` and `else` blocks only return the appropriate values for the status
 line and filename in a tuple; we then use destructuring to assign these two
 values to `status_line` and `filename` using a pattern in the `let` statement,
 as discussed in Chapter 19.

 The previously duplicated code is now outside the `if` and `else` blocks and
 uses the `status_line` and `filename` variables. This makes it easier to see the
 difference between the two cases, and it means we have only one place to update
 the code if we want to change how the file reading and response writing work.
 The behavior of the code in Listing 21-9 will be the same as that in Listing
 21-7.

 Awesome! We now have a simple web server in approximately 40 lines of Rust code
 that responds to one request with a page of content and responds to all other
 requests with a 404 response.

 Currently, our server runs in a single thread, meaning it can only serve one
 request at a time. Let’s examine how that can be a problem by simulating some
 slow requests. Then we’ll fix it so our server can handle multiple requests at
 once.