|  | r[input] | 
|  | # Input format | 
|  |  | 
|  | r[input.intro] | 
|  | This chapter describes how a source file is interpreted as a sequence of tokens. | 
|  |  | 
|  | See [Crates and source files] for a description of how programs are organised into files. | 
|  |  | 
|  | r[input.encoding] | 
|  | ## Source encoding | 
|  |  | 
|  | r[input.encoding.utf8] | 
|  | Each source file is interpreted as a sequence of Unicode characters encoded in UTF-8. | 
|  |  | 
|  | r[input.encoding.invalid] | 
|  | It is an error if the file is not valid UTF-8. | 
|  |  | 
|  | r[input.byte-order-mark] | 
|  | ## Byte order mark removal | 
|  |  | 
|  | If the first character in the sequence is `U+FEFF` ([BYTE ORDER MARK]), it is removed. | 
|  |  | 
|  | r[input.crlf] | 
|  | ## CRLF normalization | 
|  |  | 
|  | Each pair of characters `U+000D` (CR) immediately followed by `U+000A` (LF) is replaced by a single `U+000A` (LF). | 
|  |  | 
|  | Other occurrences of the character `U+000D` (CR) are left in place (they are treated as [whitespace]). | 
|  |  | 
|  | r[input.shebang] | 
|  | ## Shebang removal | 
|  |  | 
|  | r[input.shebang.intro] | 
|  | If the remaining sequence begins with the characters `#!`, the characters up to and including the first `U+000A` (LF) are removed from the sequence. | 
|  |  | 
|  | For example, the first line of the following file would be ignored: | 
|  |  | 
|  | <!-- ignore: tests don't like shebang --> | 
|  | ```rust,ignore | 
|  | #!/usr/bin/env rustx | 
|  |  | 
|  | fn main() { | 
|  | println!("Hello!"); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | r[input.shebang.inner-attribute] | 
|  | As an exception, if the `#!` characters are followed (ignoring intervening [comments] or [whitespace]) by a `[` token, nothing is removed. | 
|  | This prevents an [inner attribute] at the start of a source file being removed. | 
|  |  | 
|  | > [!NOTE] | 
|  | > The standard library [`include!`] macro applies byte order mark removal, CRLF normalization, and shebang removal to the file it reads. The [`include_str!`] and [`include_bytes!`] macros do not. | 
|  |  | 
|  | r[input.tokenization] | 
|  | ## Tokenization | 
|  |  | 
|  | The resulting sequence of characters is then converted into tokens as described in the remainder of this chapter. | 
|  |  | 
|  | [inner attribute]: attributes.md | 
|  | [BYTE ORDER MARK]: https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8 | 
|  | [comments]: comments.md | 
|  | [Crates and source files]: crates-and-source-files.md | 
|  | [_shebang_]: https://en.wikipedia.org/wiki/Shebang_(Unix) | 
|  | [whitespace]: whitespace.md |