blob: a6c40b9daa8fcfb3263ad549211290079ceda6c6 [file] [log] [blame] [view] [edit]
## Building a Single-Threaded Web Server
Well start by getting a single-threaded web server working. Before we begin,
lets look at a quick overview of the protocols involved in building web
servers. The details of these protocols are beyond the scope of this book, but a
brief overview will give you the information you need.
The two main protocols involved in web servers are _Hypertext Transfer Protocol_
_(HTTP)_ and _Transmission Control Protocol_ _(TCP)_. Both protocols are
_request-response_ protocols, meaning a _client_ initiates requests and a
_server_ listens to the requests and provides a response to the client. The
contents of those requests and responses are defined by the protocols.
TCP is the lower-level protocol that describes the details of how information
gets from one server to another but doesnt specify what that information is.
HTTP builds on top of TCP by defining the contents of the requests and
responses. Its technically possible to use HTTP with other protocols, but in
the vast majority of cases, HTTP sends its data over TCP. Well work with the
raw bytes of TCP and HTTP requests and responses.
### Listening to the TCP Connection
Our web server needs to listen to a TCP connection, so thats the first part
well work on. The standard library offers a `std::net` module that lets us do
this. Lets make a new project in the usual fashion:
```console
$ cargo new hello
Created binary (application) `hello` project
$ cd hello
```
Now enter the code in Listing 21-1 in _src/main.rs_ to start. This code will
listen at the local address `127.0.0.1:7878` for incoming TCP streams. When it
gets an incoming stream, it will print `Connection established!`.
<Listing number="21-1" file-name="src/main.rs" caption="Listening for incoming streams and printing a message when we receive a stream">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-01/src/main.rs}}
```
</Listing>
Using `TcpListener`, we can listen for TCP connections at the address
`127.0.0.1:7878`. In the address, the section before the colon is an IP address
representing your computer (this is the same on every computer and doesnt
represent the authors computer specifically), and `7878` is the port. Weve
chosen this port for two reasons: HTTP isnt normally accepted on this port so
our server is unlikely to conflict with any other web server you might have
running on your machine, and 7878 is _rust_ typed on a telephone.
The `bind` function in this scenario works like the `new` function in that it
will return a new `TcpListener` instance. The function is called `bind` because,
in networking, connecting to a port to listen to is known as binding to a
port.”
The `bind` function returns a `Result<T, E>`, which indicates that its possible
for binding to fail. For example, connecting to port 80 requires administrator
privileges (nonadministrators can listen only on ports higher than 1023), so if
we tried to connect to port 80 without being an administrator, binding wouldnt
work. Binding also wouldnt work, for example, if we ran two instances of our
program and so had two programs listening to the same port. Because were
writing a basic server just for learning purposes, we wont worry about handling
these kinds of errors; instead, we use `unwrap` to stop the program if errors
happen.
The `incoming` method on `TcpListener` returns an iterator that gives us a
sequence of streams (more specifically, streams of type `TcpStream`). A single
_stream_ represents an open connection between the client and the server. A
_connection_ is the name for the full request and response process in which a
client connects to the server, the server generates a response, and the server
closes the connection. As such, we will read from the `TcpStream` to see what
the client sent and then write our response to the stream to send data back to
the client. Overall, this `for` loop will process each connection in turn and
produce a series of streams for us to handle.
For now, our handling of the stream consists of calling `unwrap` to terminate
our program if the stream has any errors; if there arent any errors, the
program prints a message. Well add more functionality for the success case in
the next listing. The reason we might receive errors from the `incoming` method
when a client connects to the server is that were not actually iterating over
connections. Instead, were iterating over _connection attempts_. The connection
might not be successful for a number of reasons, many of them operating system
specific. For example, many operating systems have a limit to the number of
simultaneous open connections they can support; new connection attempts beyond
that number will produce an error until some of the open connections are closed.
Lets try running this code! Invoke `cargo run` in the terminal and then load
_127.0.0.1:7878_ in a web browser. The browser should show an error message like
Connection reset,” because the server isnt currently sending back any data.
But when you look at your terminal, you should see several messages that were
printed when the browser connected to the server!
```text
Running `target/debug/hello`
Connection established!
Connection established!
Connection established!
```
Sometimes, youll see multiple messages printed for one browser request; the
reason might be that the browser is making a request for the page as well as a
request for other resources, like the _favicon.ico_ icon that appears in the
browser tab.
It could also be that the browser is trying to connect to the server multiple
times because the server isnt responding with any data. When `stream` goes out
of scope and is dropped at the end of the loop, the connection is closed as part
of the `drop` implementation. Browsers sometimes deal with closed connections by
retrying, because the problem might be temporary. The important factor is that
weve successfully gotten a handle to a TCP connection!
Remember to stop the program by pressing <kbd>ctrl</kbd>-<kbd>c</kbd> when
youre done running a particular version of the code. Then restart the program
by invoking the `cargo run` command after youve made each set of code changes
to make sure youre running the newest code.
### Reading the Request
Lets implement the functionality to read the request from the browser! To
separate the concerns of first getting a connection and then taking some action
with the connection, well start a new function for processing connections. In
this new `handle_connection` function, well read data from the TCP stream and
print it so we can see the data being sent from the browser. Change the code to
look like Listing 21-2.
<Listing number="21-2" file-name="src/main.rs" caption="Reading from the `TcpStream` and printing the data">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-02/src/main.rs}}
```
</Listing>
We bring `std::io::prelude` and `std::io::BufReader` into scope to get access to
traits and types that let us read from and write to the stream. In the `for`
loop in the `main` function, instead of printing a message that says we made a
connection, we now call the new `handle_connection` function and pass the
`stream` to it.
In the `handle_connection` function, we create a new `BufReader` instance that
wraps a reference to the `stream`. The `BufReader` adds buffering by managing
calls to the `std::io::Read` trait methods for us.
We create a variable named `http_request` to collect the lines of the request
the browser sends to our server. We indicate that we want to collect these lines
in a vector by adding the `Vec<_>` type annotation.
`BufReader` implements the `std::io::BufRead` trait, which provides the `lines`
method. The `lines` method returns an iterator of
`Result<String,
std::io::Error>` by splitting the stream of data whenever it sees
a newline byte. To get each `String`, we map and `unwrap` each `Result`. The
`Result` might be an error if the data isnt valid UTF-8 or if there was a
problem reading from the stream. Again, a production program should handle these
errors more gracefully, but were choosing to stop the program in the error case
for simplicity.
The browser signals the end of an HTTP request by sending two newline characters
in a row, so to get one request from the stream, we take lines until we get a
line that is the empty string. Once weve collected the lines into the vector,
were printing them out using pretty debug formatting so we can take a look at
the instructions the web browser is sending to our server.
Lets try this code! Start the program and make a request in a web browser
again. Note that well still get an error page in the browser, but our programs
output in the terminal will now look similar to this:
```console
$ cargo run
Compiling hello v0.1.0 (file:///projects/hello)
Finished dev [unoptimized + debuginfo] target(s) in 0.42s
Running `target/debug/hello`
Request: [
"GET / HTTP/1.1",
"Host: 127.0.0.1:7878",
"User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:99.0) Gecko/20100101 Firefox/99.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate, br",
"DNT: 1",
"Connection: keep-alive",
"Upgrade-Insecure-Requests: 1",
"Sec-Fetch-Dest: document",
"Sec-Fetch-Mode: navigate",
"Sec-Fetch-Site: none",
"Sec-Fetch-User: ?1",
"Cache-Control: max-age=0",
]
```
Depending on your browser, you might get slightly different output. Now that
were printing the request data, we can see why we get multiple connections from
one browser request by looking at the path after `GET` in the first line of the
request. If the repeated connections are all requesting _/_, we know the browser
is trying to fetch _/_ repeatedly because its not getting a response from our
program.
Lets break down this request data to understand what the browser is asking of
our program.
### A Closer Look at an HTTP Request
HTTP is a text-based protocol, and a request takes this format:
```text
Method Request-URI HTTP-Version CRLF
headers CRLF
message-body
```
The first line is the _request line_ that holds information about what the
client is requesting. The first part of the request line indicates the _method_
being used, such as `GET` or `POST`, which describes how the client is making
this request. Our client used a `GET` request, which means it is asking for
information.
The next part of the request line is _/_, which indicates the _Uniform Resource
Identifier_ _(URI)_ the client is requesting: a URI is almost, but not quite,
the same as a _Uniform Resource Locator_ _(URL)_. The difference between URIs
and URLs isnt important for our purposes in this chapter, but the HTTP spec
uses the term URI, so we can just mentally substitute URL for URI here.
The last part is the HTTP version the client uses, and then the request line
ends in a _CRLF sequence_. (CRLF stands for _carriage return_ and _line feed_,
which are terms from the typewriter days!) The CRLF sequence can also be written
as `\r\n`, where `\r` is a carriage return and `\n` is a line feed. The CRLF
sequence separates the request line from the rest of the request data. Note that
when the CRLF is printed, we see a new line start rather than `\r\n`.
Looking at the request line data we received from running our program so far, we
see that `GET` is the method, _/_ is the request URI, and `HTTP/1.1` is the
version.
After the request line, the remaining lines starting from `Host:` onward are
headers. `GET` requests have no body.
Try making a request from a different browser or asking for a different address,
such as _127.0.0.1:7878/test_, to see how the request data changes.
Now that we know what the browser is asking for, lets send back some data!
### Writing a Response
Were going to implement sending data in response to a client request. Responses
have the following format:
```text
HTTP-Version Status-Code Reason-Phrase CRLF
headers CRLF
message-body
```
The first line is a _status line_ that contains the HTTP version used in the
response, a numeric status code that summarizes the result of the request, and a
reason phrase that provides a text description of the status code. After the
CRLF sequence are any headers, another CRLF sequence, and the body of the
response.
Here is an example response that uses HTTP version 1.1, has a status code of
200, an OK reason phrase, no headers, and no body:
```text
HTTP/1.1 200 OK\r\n\r\n
```
The status code 200 is the standard success response. The text is a tiny
successful HTTP response. Lets write this to the stream as our response to a
successful request! From the `handle_connection` function, remove the `println!`
that was printing the request data and replace it with the code in Listing 21-3.
<Listing number="21-3" file-name="src/main.rs" caption="Writing a tiny successful HTTP response to the stream">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-03/src/main.rs:here}}
```
</Listing>
The first new line defines the `response` variable that holds the success
messages data. Then we call `as_bytes` on our `response` to convert the string
data to bytes. The `write_all` method on `stream` takes a `&[u8]` and sends
those bytes directly down the connection. Because the `write_all` operation
could fail, we use `unwrap` on any error result as before. Again, in a real
application you would add error handling here.
With these changes, lets run our code and make a request. Were no longer
printing any data to the terminal, so we wont see any output other than the
output from Cargo. When you load _127.0.0.1:7878_ in a web browser, you should
get a blank page instead of an error. Youve just hand-coded receiving an HTTP
request and sending a response!
### Returning Real HTML
Lets implement the functionality for returning more than a blank page. Create
the new file _hello.html_ in the root of your project directory, not in the
_src_ directory. You can input any HTML you want; Listing 21-4 shows one
possibility.
<Listing number="21-4" file-name="hello.html" caption="A sample HTML file to return in a response">
```html
{{#include ../listings/ch21-web-server/listing-21-05/hello.html}}
```
</Listing>
This is a minimal HTML5 document with a heading and some text. To return this
from the server when a request is received, well modify `handle_connection` as
shown in Listing 21-5 to read the HTML file, add it to the response as a body,
and send it.
<Listing number="21-5" file-name="src/main.rs" caption="Sending the contents of *hello.html* as the body of the response">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-05/src/main.rs:here}}
```
</Listing>
Weve added `fs` to the `use` statement to bring the standard librarys
filesystem module into scope. The code for reading the contents of a file to a
string should look familiar; we used it in Chapter 12 when we read the contents
of a file for our I/O project in Listing 12-4.
Next, we use `format!` to add the files contents as the body of the success
response. To ensure a valid HTTP response, we add the `Content-Length` header
which is set to the size of our response body, in this case the size of
`hello.html`.
Run this code with `cargo run` and load _127.0.0.1:7878_ in your browser; you
should see your HTML rendered!
Currently, were ignoring the request data in `http_request` and just sending
back the contents of the HTML file unconditionally. That means if you try
requesting _127.0.0.1:7878/something-else_ in your browser, youll still get
back this same HTML response. At the moment, our server is very limited and does
not do what most web servers do. We want to customize our responses depending on
the request and only send back the HTML file for a well-formed request to _/_.
### Validating the Request and Selectively Responding
Right now, our web server will return the HTML in the file no matter what the
client requested. Lets add functionality to check that the browser is
requesting _/_ before returning the HTML file and return an error if the browser
requests anything else. For this we need to modify `handle_connection`, as shown
in Listing 21-6. This new code checks the content of the request received
against what we know a request for _/_ looks like and adds `if` and `else`
blocks to treat requests differently.
<Listing number="21-6" file-name="src/main.rs" caption="Handling requests to */* differently from other requests">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-06/src/main.rs:here}}
```
</Listing>
Were only going to be looking at the first line of the HTTP request, so rather
than reading the entire request into a vector, were calling `next` to get the
first item from the iterator. The first `unwrap` takes care of the `Option` and
stops the program if the iterator has no items. The second `unwrap` handles the
`Result` and has the same effect as the `unwrap` that was in the `map` added in
Listing 21-2.
Next, we check the `request_line` to see if it equals the request line of a GET
request to the _/_ path. If it does, the `if` block returns the contents of our
HTML file.
If the `request_line` does _not_ equal the GET request to the _/_ path, it means
weve received some other request. Well add code to the `else` block in a
moment to respond to all other requests.
Run this code now and request _127.0.0.1:7878_; you should get the HTML in
_hello.html_. If you make any other request, such as
_127.0.0.1:7878/something-else_, youll get a connection error like those you
saw when running the code in Listing 21-1 and Listing 21-2.
Now lets add the code in Listing 21-7 to the `else` block to return a response
with the status code 404, which signals that the content for the request was not
found. Well also return some HTML for a page to render in the browser
indicating the response to the end user.
<Listing number="21-7" file-name="src/main.rs" caption="Responding with status code 404 and an error page if anything other than */* was requested">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-07/src/main.rs:here}}
```
</Listing>
Here, our response has a status line with status code 404 and the reason phrase
`NOT FOUND`. The body of the response will be the HTML in the file _404.html_.
Youll need to create a _404.html_ file next to _hello.html_ for the error page;
again feel free to use any HTML you want or use the example HTML in Listing
21-8.
<Listing number="21-8" file-name="404.html" caption="Sample content for the page to send back with any 404 response">
```html
{{#include ../listings/ch21-web-server/listing-21-07/404.html}}
```
</Listing>
With these changes, run your server again. Requesting _127.0.0.1:7878_ should
return the contents of _hello.html_, and any other request, like
_127.0.0.1:7878/foo_, should return the error HTML from _404.html_.
### A Touch of Refactoring
At the moment the `if` and `else` blocks have a lot of repetition: theyre both
reading files and writing the contents of the files to the stream. The only
differences are the status line and the filename. Lets make the code more
concise by pulling out those differences into separate `if` and `else` lines
that will assign the values of the status line and the filename to variables; we
can then use those variables unconditionally in the code to read the file and
write the response. Listing 21-9 shows the resulting code after replacing the
large `if` and `else` blocks.
<Listing number="21-9" file-name="src/main.rs" caption="Refactoring the `if` and `else` blocks to contain only the code that differs between the two cases">
```rust,no_run
{{#rustdoc_include ../listings/ch21-web-server/listing-21-09/src/main.rs:here}}
```
</Listing>
Now the `if` and `else` blocks only return the appropriate values for the status
line and filename in a tuple; we then use destructuring to assign these two
values to `status_line` and `filename` using a pattern in the `let` statement,
as discussed in Chapter 19.
The previously duplicated code is now outside the `if` and `else` blocks and
uses the `status_line` and `filename` variables. This makes it easier to see the
difference between the two cases, and it means we have only one place to update
the code if we want to change how the file reading and response writing work.
The behavior of the code in Listing 21-9 will be the same as that in Listing
21-7.
Awesome! We now have a simple web server in approximately 40 lines of Rust code
that responds to one request with a page of content and responds to all other
requests with a 404 response.
Currently, our server runs in a single thread, meaning it can only serve one
request at a time. Lets examine how that can be a problem by simulating some
slow requests. Then well fix it so our server can handle multiple requests at
once.