| r[lex.token] |
| # Tokens |
| |
| r[lex.token.syntax] |
| ```grammar,lexer |
| Token -> |
| IDENTIFIER_OR_KEYWORD |
| | RAW_IDENTIFIER |
| | CHAR_LITERAL |
| | STRING_LITERAL |
| | RAW_STRING_LITERAL |
| | BYTE_LITERAL |
| | BYTE_STRING_LITERAL |
| | RAW_BYTE_STRING_LITERAL |
| | C_STRING_LITERAL |
| | RAW_C_STRING_LITERAL |
| | INTEGER_LITERAL |
| | FLOAT_LITERAL |
| | LIFETIME_TOKEN |
| | PUNCTUATION |
| | RESERVED_TOKEN |
| ``` |
| |
| r[lex.token.intro] |
| Tokens are primitive productions in the grammar defined by regular |
| (non-recursive) languages. Rust source input can be broken down |
| into the following kinds of tokens: |
| |
| * [Keywords] |
| * [Identifiers][identifier] |
| * [Literals](#literals) |
| * [Lifetimes](#lifetimes-and-loop-labels) |
| * [Punctuation](#punctuation) |
| * [Delimiters](#delimiters) |
| |
| Within this documentation's grammar, "simple" tokens are given in [string |
| table production] form, and appear in `monospace` font. |
| |
| [string table production]: notation.md#string-table-productions |
| |
| r[lex.token.literal] |
| ## Literals |
| |
| Literals are tokens used in [literal expressions]. |
| |
| ### Examples |
| |
| #### Characters and strings |
| |
| | | Example | `#` sets[^nsets] | Characters | Escapes | |
| |----------------------------------------------|-----------------|------------|-------------|---------------------| |
| | [Character](#character-literals) | `'H'` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | |
| | [String](#string-literals) | `"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [ASCII](#ascii-escapes) & [Unicode](#unicode-escapes) | |
| | [Raw string](#raw-string-literals) | `r#"hello"#` | <256 | All Unicode | `N/A` | |
| | [Byte](#byte-literals) | `b'H'` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | |
| | [Byte string](#byte-string-literals) | `b"hello"` | 0 | All ASCII | [Quote](#quote-escapes) & [Byte](#byte-escapes) | |
| | [Raw byte string](#raw-byte-string-literals) | `br#"hello"#` | <256 | All ASCII | `N/A` | |
| | [C string](#c-string-literals) | `c"hello"` | 0 | All Unicode | [Quote](#quote-escapes) & [Byte](#byte-escapes) & [Unicode](#unicode-escapes) | |
| | [Raw C string](#raw-c-string-literals) | `cr#"hello"#` | <256 | All Unicode | `N/A` | |
| |
| [^nsets]: The number of `#`s on each side of the same literal must be equivalent. |
| |
| > [!NOTE] |
| > Character and string literal tokens never include the sequence of `U+000D` (CR) immediately followed by `U+000A` (LF): this pair would have been previously transformed into a single `U+000A` (LF). |
| |
| #### ASCII escapes |
| |
| | | Name | |
| |---|------| |
| | `\x41` | 7-bit character code (exactly 2 digits, up to 0x7F) | |
| | `\n` | Newline | |
| | `\r` | Carriage return | |
| | `\t` | Tab | |
| | `\\` | Backslash | |
| | `\0` | Null | |
| |
| #### Byte escapes |
| |
| | | Name | |
| |---|------| |
| | `\x7F` | 8-bit character code (exactly 2 digits) | |
| | `\n` | Newline | |
| | `\r` | Carriage return | |
| | `\t` | Tab | |
| | `\\` | Backslash | |
| | `\0` | Null | |
| |
| #### Unicode escapes |
| |
| | | Name | |
| |---|------| |
| | `\u{7FFF}` | 24-bit Unicode character code (up to 6 digits) | |
| |
| #### Quote escapes |
| |
| | | Name | |
| |---|------| |
| | `\'` | Single quote | |
| | `\"` | Double quote | |
| |
| #### Numbers |
| |
| | [Number literals](#number-literals)[^nl] | Example | Exponentiation | |
| |----------------------------------------|---------|----------------| |
| | Decimal integer | `98_222` | `N/A` | |
| | Hex integer | `0xff` | `N/A` | |
| | Octal integer | `0o77` | `N/A` | |
| | Binary integer | `0b1111_0000` | `N/A` | |
| | Floating-point | `123.0E+77` | `Optional` | |
| |
| [^nl]: All number literals allow `_` as a visual separator: `1_234.0E+18f64` |
| |
| r[lex.token.literal.suffix] |
| #### Suffixes |
| |
| r[lex.token.literal.literal.suffix.intro] |
| A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword. |
| |
| r[lex.token.literal.suffix.syntax] |
| ```grammar,lexer |
| SUFFIX -> IDENTIFIER_OR_KEYWORD |
| |
| SUFFIX_NO_E -> SUFFIX _not beginning with `e` or `E`_ |
| ``` |
| |
| r[lex.token.literal.suffix.validity] |
| Any kind of literal (string, integer, etc) with any suffix is valid as a token. |
| |
| A literal token with any suffix can be passed to a macro without producing an error. |
| The macro itself will decide how to interpret such a token and whether to produce an error or not. |
| In particular, the `literal` fragment specifier for by-example macros matches literal tokens with arbitrary suffixes. |
| |
| ```rust |
| macro_rules! blackhole { ($tt:tt) => () } |
| macro_rules! blackhole_lit { ($l:literal) => () } |
| |
| blackhole!("string"suffix); // OK |
| blackhole_lit!(1suffix); // OK |
| ``` |
| |
| r[lex.token.literal.suffix.parse] |
| However, suffixes on literal tokens which are interpreted as literal expressions or patterns are restricted. |
| Any suffixes are rejected on non-numeric literal tokens, |
| and numeric literal tokens are accepted only with suffixes from the list below. |
| |
| | Integer | Floating-point | |
| |---------|----------------| |
| | `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize` | `f32`, `f64` | |
| |
| ### Character and string literals |
| |
| r[lex.token.literal.char] |
| #### Character literals |
| |
| r[lex.token.literal.char.syntax] |
| ```grammar,lexer |
| CHAR_LITERAL -> |
| `'` |
| ( ~[`'` `\` LF CR TAB] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) |
| `'` SUFFIX? |
| |
| QUOTE_ESCAPE -> `\'` | `\"` |
| |
| ASCII_ESCAPE -> |
| `\x` OCT_DIGIT HEX_DIGIT |
| | `\n` | `\r` | `\t` | `\\` | `\0` |
| |
| UNICODE_ESCAPE -> |
| `\u{` ( HEX_DIGIT `_`* ){1..6} `}` |
| ``` |
| |
| r[lex.token.literal.char.intro] |
| A _character literal_ is a single Unicode character enclosed within two |
| `U+0027` (single-quote) characters, with the exception of `U+0027` itself, |
| which must be _escaped_ by a preceding `U+005C` character (`\`). |
| |
| r[lex.token.literal.str] |
| #### String literals |
| |
| r[lex.token.literal.str.syntax] |
| ```grammar,lexer |
| STRING_LITERAL -> |
| `"` ( |
| ~[`"` `\` CR] |
| | QUOTE_ESCAPE |
| | ASCII_ESCAPE |
| | UNICODE_ESCAPE |
| | STRING_CONTINUE |
| )* `"` SUFFIX? |
| |
| STRING_CONTINUE -> `\` LF |
| ``` |
| |
| r[lex.token.literal.str.intro] |
| A _string literal_ is a sequence of any Unicode characters enclosed within two |
| `U+0022` (double-quote) characters, with the exception of `U+0022` itself, |
| which must be _escaped_ by a preceding `U+005C` character (`\`). |
| |
| r[lex.token.literal.str.linefeed] |
| Line-breaks, represented by the character `U+000A` (LF), are allowed in string literals. |
| When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token. |
| See [String continuation escapes] for details. |
| The character `U+000D` (CR) may not appear in a string literal other than as part of such a string continuation escape. |
| |
| r[lex.token.literal.char-escape] |
| #### Character escapes |
| |
| r[lex.token.literal.char-escape.intro] |
| Some additional _escapes_ are available in either character or non-raw string |
| literals. An escape starts with a `U+005C` (`\`) and continues with one of the |
| following forms: |
| |
| r[lex.token.literal.char-escape.ascii] |
| * A _7-bit code point escape_ starts with `U+0078` (`x`) and is |
| followed by exactly two _hex digits_ with value up to `0x7F`. It denotes the |
| ASCII character with value equal to the provided hex value. Higher values are |
| not permitted because it is ambiguous whether they mean Unicode code points or |
| byte values. |
| |
| r[lex.token.literal.char-escape.unicode] |
| * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed |
| by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D` |
| (`}`). It denotes the Unicode code point equal to the provided hex value. |
| |
| r[lex.token.literal.char-escape.whitespace] |
| * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` |
| (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF), |
| `U+000D` (CR) or `U+0009` (HT) respectively. |
| |
| r[lex.token.literal.char-escape.null] |
| * The _null escape_ is the character `U+0030` (`0`) and denotes the Unicode |
| value `U+0000` (NUL). |
| |
| r[lex.token.literal.char-escape.slash] |
| * The _backslash escape_ is the character `U+005C` (`\`) which must be |
| escaped in order to denote itself. |
| |
| r[lex.token.literal.str-raw] |
| #### Raw string literals |
| |
| r[lex.token.literal.str-raw.syntax] |
| ```grammar,lexer |
| RAW_STRING_LITERAL -> `r` RAW_STRING_CONTENT SUFFIX? |
| |
| RAW_STRING_CONTENT -> |
| `"` ( ~CR )*? `"` |
| | `#` RAW_STRING_CONTENT `#` |
| ``` |
| |
| r[lex.token.literal.str-raw.intro] |
| Raw string literals do not process any escapes. They start with the character |
| `U+0072` (`r`), followed by fewer than 256 of the character `U+0023` (`#`) and a |
| `U+0022` (double-quote) character. |
| |
| r[lex.token.literal.str-raw.body] |
| The _raw string body_ can contain any sequence of Unicode characters other than `U+000D` (CR). |
| It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character. |
| |
| r[lex.token.literal.str-raw.content] |
| All Unicode characters contained in the raw string body represent themselves, |
| the characters `U+0022` (double-quote) (except when followed by at least as |
| many `U+0023` (`#`) characters as were used to start the raw string literal) or |
| `U+005C` (`\`) do not have any special meaning. |
| |
| Examples for string literals: |
| |
| ```rust |
| "foo"; r"foo"; // foo |
| "\"foo\""; r#""foo""#; // "foo" |
| |
| "foo #\"# bar"; |
| r##"foo #"# bar"##; // foo #"# bar |
| |
| "\x52"; "R"; r"R"; // R |
| "\\x52"; r"\x52"; // \x52 |
| ``` |
| |
| ### Byte and byte string literals |
| |
| r[lex.token.byte] |
| #### Byte literals |
| |
| r[lex.token.byte.syntax] |
| ```grammar,lexer |
| BYTE_LITERAL -> |
| `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` SUFFIX? |
| |
| ASCII_FOR_CHAR -> |
| <any ASCII (i.e. 0x00 to 0x7F) except `'`, `\`, LF, CR, or TAB> |
| |
| BYTE_ESCAPE -> |
| `\x` HEX_DIGIT HEX_DIGIT |
| | `\n` | `\r` | `\t` | `\\` | `\0` | `\'` | `\"` |
| ``` |
| |
| r[lex.token.byte.intro] |
| A _byte literal_ is a single ASCII character (in the `U+0000` to `U+007F` |
| range) or a single _escape_ preceded by the characters `U+0062` (`b`) and |
| `U+0027` (single-quote), and followed by the character `U+0027`. If the character |
| `U+0027` is present within the literal, it must be _escaped_ by a preceding |
| `U+005C` (`\`) character. It is equivalent to a `u8` unsigned 8-bit integer |
| _number literal_. |
| |
| r[lex.token.str-byte] |
| #### Byte string literals |
| |
| r[lex.token.str-byte.syntax] |
| ```grammar,lexer |
| BYTE_STRING_LITERAL -> |
| `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )* `"` SUFFIX? |
| |
| ASCII_FOR_STRING -> |
| <any ASCII (i.e 0x00 to 0x7F) except `"`, `\`, or CR> |
| ``` |
| |
| r[lex.token.str-byte.intro] |
| A non-raw _byte string literal_ is a sequence of ASCII characters and _escapes_, |
| preceded by the characters `U+0062` (`b`) and `U+0022` (double-quote), and |
| followed by the character `U+0022`. If the character `U+0022` is present within |
| the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character. |
| Alternatively, a byte string literal can be a _raw byte string literal_, defined |
| below. |
| |
| r[lex.token.str-byte.linefeed] |
| Line-breaks, represented by the character `U+000A` (LF), are allowed in byte string literals. |
| When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token. |
| See [String continuation escapes] for details. |
| The character `U+000D` (CR) may not appear in a byte string literal other than as part of such a string continuation escape. |
| |
| r[lex.token.str-byte.escape] |
| Some additional _escapes_ are available in either byte or non-raw byte string |
| literals. An escape starts with a `U+005C` (`\`) and continues with one of the |
| following forms: |
| |
| r[lex.token.str-byte.escape-byte] |
| * A _byte escape_ escape starts with `U+0078` (`x`) and is |
| followed by exactly two _hex digits_. It denotes the byte |
| equal to the provided hex value. |
| |
| r[lex.token.str-byte.escape-whitespace] |
| * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` |
| (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF), |
| `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively. |
| |
| r[lex.token.str-byte.escape-null] |
| * The _null escape_ is the character `U+0030` (`0`) and denotes the byte |
| value `0x00` (ASCII NUL). |
| |
| r[lex.token.str-byte.escape-slash] |
| * The _backslash escape_ is the character `U+005C` (`\`) which must be |
| escaped in order to denote its ASCII encoding `0x5C`. |
| |
| r[lex.token.str-byte-raw] |
| #### Raw byte string literals |
| |
| r[lex.token.str-byte-raw.syntax] |
| ```grammar,lexer |
| RAW_BYTE_STRING_LITERAL -> |
| `br` RAW_BYTE_STRING_CONTENT SUFFIX? |
| |
| RAW_BYTE_STRING_CONTENT -> |
| `"` ASCII_FOR_RAW*? `"` |
| | `#` RAW_BYTE_STRING_CONTENT `#` |
| |
| ASCII_FOR_RAW -> |
| <any ASCII (i.e. 0x00 to 0x7F) except CR> |
| ``` |
| |
| r[lex.token.str-byte-raw.intro] |
| Raw byte string literals do not process any escapes. They start with the |
| character `U+0062` (`b`), followed by `U+0072` (`r`), followed by fewer than 256 |
| of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. |
| |
| r[lex.token.str-byte-raw.body] |
| The _raw string body_ can contain any sequence of ASCII characters other than `U+000D` (CR). |
| It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character. |
| A raw byte string literal can not contain any non-ASCII byte. |
| |
| r[lex.token.literal.str-byte-raw.content] |
| All characters contained in the raw string body represent their ASCII encoding, |
| the characters `U+0022` (double-quote) (except when followed by at least as |
| many `U+0023` (`#`) characters as were used to start the raw string literal) or |
| `U+005C` (`\`) do not have any special meaning. |
| |
| Examples for byte string literals: |
| |
| ```rust |
| b"foo"; br"foo"; // foo |
| b"\"foo\""; br#""foo""#; // "foo" |
| |
| b"foo #\"# bar"; |
| br##"foo #"# bar"##; // foo #"# bar |
| |
| b"\x52"; b"R"; br"R"; // R |
| b"\\x52"; br"\x52"; // \x52 |
| ``` |
| |
| ### C string and raw C string literals |
| |
| r[lex.token.str-c] |
| #### C string literals |
| |
| r[lex.token.str-c.syntax] |
| ```grammar,lexer |
| C_STRING_LITERAL -> |
| `c"` ( |
| ~[`"` `\` CR NUL] |
| | BYTE_ESCAPE _except `\0` or `\x00`_ |
| | UNICODE_ESCAPE _except `\u{0}`, `\u{00}`, …, `\u{000000}`_ |
| | STRING_CONTINUE |
| )* `"` SUFFIX? |
| |
| ``` |
| |
| r[lex.token.str-c.intro] |
| A _C string literal_ is a sequence of Unicode characters and _escapes_, |
| preceded by the characters `U+0063` (`c`) and `U+0022` (double-quote), and |
| followed by the character `U+0022`. If the character `U+0022` is present within |
| the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character. |
| Alternatively, a C string literal can be a _raw C string literal_, defined below. |
| |
| [CStr]: core::ffi::CStr |
| |
| r[lex.token.str-c.null] |
| C strings are implicitly terminated by byte `0x00`, so the C string literal |
| `c""` is equivalent to manually constructing a `&CStr` from the byte string |
| literal `b"\x00"`. Other than the implicit terminator, byte `0x00` is not |
| permitted within a C string. |
| |
| r[lex.token.str-c.linefeed] |
| Line-breaks, represented by the character `U+000A` (LF), are allowed in C string literals. |
| When an unescaped `U+005C` character (`\`) occurs immediately before a line break, the line break does not appear in the string represented by the token. |
| See [String continuation escapes] for details. |
| The character `U+000D` (CR) may not appear in a C string literal other than as part of such a string continuation escape. |
| |
| r[lex.token.str-c.escape] |
| Some additional _escapes_ are available in non-raw C string literals. An escape |
| starts with a `U+005C` (`\`) and continues with one of the following forms: |
| |
| r[lex.token.str-c.escape-byte] |
| * A _byte escape_ escape starts with `U+0078` (`x`) and is followed by exactly |
| two _hex digits_. It denotes the byte equal to the provided hex value. |
| |
| r[lex.token.str-c.escape-unicode] |
| * A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed |
| by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D` |
| (`}`). It denotes the Unicode code point equal to the provided hex value, |
| encoded as UTF-8. |
| |
| r[lex.token.str-c.escape-whitespace] |
| * A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` |
| (`r`), or `U+0074` (`t`), denoting the bytes values `0x0A` (ASCII LF), |
| `0x0D` (ASCII CR) or `0x09` (ASCII HT) respectively. |
| |
| r[lex.token.str-c.escape-slash] |
| * The _backslash escape_ is the character `U+005C` (`\`) which must be |
| escaped in order to denote its ASCII encoding `0x5C`. |
| |
| r[lex.token.str-c.char-unicode] |
| A C string represents bytes with no defined encoding, but a C string literal |
| may contain Unicode characters above `U+007F`. Such characters will be replaced |
| with the bytes of that character's UTF-8 representation. |
| |
| The following C string literals are equivalent: |
| |
| ```rust |
| c"æ"; // LATIN SMALL LETTER AE (U+00E6) |
| c"\u{00E6}"; |
| c"\xC3\xA6"; |
| ``` |
| |
| r[lex.token.str-c.edition2021] |
| > [!EDITION-2021] |
| > C string literals are accepted in the 2021 edition or later. In earlier additions the token `c""` is lexed as `c ""`. |
| |
| r[lex.token.str-c-raw] |
| #### Raw C string literals |
| |
| r[lex.token.str-c-raw.syntax] |
| ```grammar,lexer |
| RAW_C_STRING_LITERAL -> |
| `cr` RAW_C_STRING_CONTENT SUFFIX? |
| |
| RAW_C_STRING_CONTENT -> |
| `"` ( ~[CR NUL] )*? `"` |
| | `#` RAW_C_STRING_CONTENT `#` |
| ``` |
| |
| r[lex.token.str-c-raw.intro] |
| Raw C string literals do not process any escapes. They start with the |
| character `U+0063` (`c`), followed by `U+0072` (`r`), followed by fewer than 256 |
| of the character `U+0023` (`#`), and a `U+0022` (double-quote) character. |
| |
| r[lex.token.str-c-raw.body] |
| The _raw C string body_ can contain any sequence of Unicode characters other than `U+0000` (NUL) and `U+000D` (CR). |
| It is terminated only by another `U+0022` (double-quote) character, followed by the same number of `U+0023` (`#`) characters that preceded the opening `U+0022` (double-quote) character. |
| |
| r[lex.token.str-c-raw.content] |
| All characters contained in the raw C string body represent themselves in UTF-8 |
| encoding. The characters `U+0022` (double-quote) (except when followed by at |
| least as many `U+0023` (`#`) characters as were used to start the raw C string |
| literal) or `U+005C` (`\`) do not have any special meaning. |
| |
| r[lex.token.str-c-raw.edition2021] |
| > [!EDITION-2021] |
| > Raw C string literals are accepted in the 2021 edition or later. In earlier additions the token `cr""` is lexed as `cr ""`, and `cr#""#` is lexed as `cr #""#` (which is non-grammatical). |
| |
| #### Examples for C string and raw C string literals |
| |
| ```rust |
| c"foo"; cr"foo"; // foo |
| c"\"foo\""; cr#""foo""#; // "foo" |
| |
| c"foo #\"# bar"; |
| cr##"foo #"# bar"##; // foo #"# bar |
| |
| c"\x52"; c"R"; cr"R"; // R |
| c"\\x52"; cr"\x52"; // \x52 |
| ``` |
| |
| r[lex.token.literal.num] |
| ### Number literals |
| |
| A _number literal_ is either an _integer literal_ or a _floating-point |
| literal_. The grammar for recognizing the two kinds of literals is mixed. |
| |
| r[lex.token.literal.int] |
| #### Integer literals |
| |
| r[lex.token.literal.int.syntax] |
| ```grammar,lexer |
| INTEGER_LITERAL -> |
| ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) SUFFIX_NO_E? |
| |
| DEC_LITERAL -> DEC_DIGIT (DEC_DIGIT|`_`)* |
| |
| BIN_LITERAL -> `0b` (BIN_DIGIT|`_`)* BIN_DIGIT (BIN_DIGIT|`_`)* |
| |
| OCT_LITERAL -> `0o` (OCT_DIGIT|`_`)* OCT_DIGIT (OCT_DIGIT|`_`)* |
| |
| HEX_LITERAL -> `0x` (HEX_DIGIT|`_`)* HEX_DIGIT (HEX_DIGIT|`_`)* |
| |
| BIN_DIGIT -> [`0`-`1`] |
| |
| OCT_DIGIT -> [`0`-`7`] |
| |
| DEC_DIGIT -> [`0`-`9`] |
| |
| HEX_DIGIT -> [`0`-`9` `a`-`f` `A`-`F`] |
| ``` |
| |
| r[lex.token.literal.int.kind] |
| An _integer literal_ has one of four forms: |
| |
| r[lex.token.literal.int.kind-dec] |
| * A _decimal literal_ starts with a *decimal digit* and continues with any |
| mixture of *decimal digits* and _underscores_. |
| |
| r[lex.token.literal.int.kind-hex] |
| * A _hex literal_ starts with the character sequence `U+0030` `U+0078` |
| (`0x`) and continues as any mixture (with at least one digit) of hex digits |
| and underscores. |
| |
| r[lex.token.literal.int.kind-oct] |
| * An _octal literal_ starts with the character sequence `U+0030` `U+006F` |
| (`0o`) and continues as any mixture (with at least one digit) of octal digits |
| and underscores. |
| |
| r[lex.token.literal.int.kind-bin] |
| * A _binary literal_ starts with the character sequence `U+0030` `U+0062` |
| (`0b`) and continues as any mixture (with at least one digit) of binary digits |
| and underscores. |
| |
| r[lex.token.literal.int.restriction] |
| Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above. |
| The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal. |
| See [Integer literal expressions] for the effect of these suffixes. |
| |
| Examples of integer literals which are accepted as literal expressions: |
| |
| ```rust |
| # #![allow(overflowing_literals)] |
| 123; |
| 123i32; |
| 123u32; |
| 123_u32; |
| |
| 0xff; |
| 0xff_u8; |
| 0x01_f32; // integer 7986, not floating-point 1.0 |
| 0x01_e3; // integer 483, not floating-point 1000.0 |
| |
| 0o70; |
| 0o70_i16; |
| |
| 0b1111_1111_1001_0000; |
| 0b1111_1111_1001_0000i64; |
| 0b________1; |
| |
| 0usize; |
| |
| // These are too big for their type, but are accepted as literal expressions. |
| 128_i8; |
| 256_u8; |
| |
| // This is an integer literal, accepted as a floating-point literal expression. |
| 5f32; |
| ``` |
| |
| Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. |
| |
| Examples of integer literals which are not accepted as literal expressions: |
| |
| ```rust |
| # #[cfg(false)] { |
| 0invalidSuffix; |
| 123AFB43; |
| 0b010a; |
| 0xAB_CD_EF_GH; |
| 0b1111_f32; |
| # } |
| ``` |
| |
| r[lex.token.literal.int.tuple-field] |
| #### Tuple index |
| |
| r[lex.token.literal.int.tuple-field.syntax] |
| ```grammar,lexer |
| TUPLE_INDEX -> INTEGER_LITERAL |
| ``` |
| |
| r[lex.token.literal.int.tuple-field.intro] |
| A tuple index is used to refer to the fields of [tuples], [tuple structs], and |
| [tuple variants]. |
| |
| r[lex.token.literal.int.tuple-field.eq] |
| Tuple indices are compared with the literal token directly. Tuple indices |
| start with `0` and each successive index increments the value by `1` as a |
| decimal value. Thus, only decimal values will match, and the value must not |
| have any extra `0` prefix characters. |
| |
| ```rust,compile_fail |
| let example = ("dog", "cat", "horse"); |
| let dog = example.0; |
| let cat = example.1; |
| // The following examples are invalid. |
| let cat = example.01; // ERROR no field named `01` |
| let horse = example.0b10; // ERROR no field named `0b10` |
| ``` |
| |
| > [!NOTE] |
| > Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version. See <https://github.com/rust-lang/rust/issues/60210> for more information. |
| |
| r[lex.token.literal.float] |
| #### Floating-point literals |
| |
| r[lex.token.literal.float.syntax] |
| ```grammar,lexer |
| FLOAT_LITERAL -> |
| DEC_LITERAL `.` _not immediately followed by `.`, `_` or an XID_Start character_ |
| | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E? |
| | DEC_LITERAL (`.` DEC_LITERAL)? FLOAT_EXPONENT SUFFIX? |
| |
| FLOAT_EXPONENT -> |
| (`e`|`E`) (`+`|`-`)? (DEC_DIGIT|`_`)* DEC_DIGIT (DEC_DIGIT|`_`)* |
| ``` |
| |
| r[lex.token.literal.float.form] |
| A _floating-point literal_ has one of two forms: |
| |
| * A _decimal literal_ followed by a period character `U+002E` (`.`). This is |
| optionally followed by another decimal literal, with an optional _exponent_. |
| * A single _decimal literal_ followed by an _exponent_. |
| |
| r[lex.token.literal.float.suffix] |
| Like integer literals, a floating-point literal may be followed by a |
| suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). |
| The suffix may not begin with `e` or `E` if the literal does not include an exponent. |
| See [Floating-point literal expressions] for the effect of these suffixes. |
| |
| Examples of floating-point literals which are accepted as literal expressions: |
| |
| ```rust |
| 123.0f64; |
| 0.1f64; |
| 0.1f32; |
| 12E+99_f64; |
| let x: f64 = 2.; |
| ``` |
| |
| This last example is different because it is not possible to use the suffix |
| syntax with a floating point literal end.token.ing in a period. `2.f64` would attempt |
| to call a method named `f64` on `2`. |
| |
| Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. |
| |
| Examples of floating-point literals which are not accepted as literal expressions: |
| |
| ```rust |
| # #[cfg(false)] { |
| 2.0f80; |
| 2e5f80; |
| 2e5e6; |
| 2.0e5e6; |
| 1.3e10u64; |
| # } |
| ``` |
| |
| r[lex.token.literal.reserved] |
| #### Reserved forms similar to number literals |
| |
| r[lex.token.literal.reserved.syntax] |
| ```grammar,lexer |
| RESERVED_NUMBER -> |
| BIN_LITERAL [`2`-`9`] |
| | OCT_LITERAL [`8`-`9`] |
| | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` _not immediately followed by `.`, `_` or an XID_Start character_ |
| | ( BIN_LITERAL | OCT_LITERAL ) (`e`|`E`) |
| | `0b` `_`* <end of input or not BIN_DIGIT> |
| | `0o` `_`* <end of input or not OCT_DIGIT> |
| | `0x` `_`* <end of input or not HEX_DIGIT> |
| | DEC_LITERAL ( `.` DEC_LITERAL )? (`e` | `E`) (`+` | `-`)? <end of input or not DEC_DIGIT> |
| |
| ``` |
| |
| r[lex.token.literal.reserved.intro] |
| The following lexical forms similar to number literals are _reserved forms_. |
| Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens. |
| |
| r[lex.token.literal.reserved.out-of-range] |
| * An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix. |
| |
| r[lex.token.literal.reserved.period] |
| * An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals). |
| |
| r[lex.token.literal.reserved.exp] |
| * An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`. |
| |
| r[lex.token.literal.reserved.empty-with-radix] |
| * Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits). |
| |
| r[lex.token.literal.reserved.empty-exp] |
| * Input which has the form of a floating-point literal with no digits in the exponent. |
| |
| Examples of reserved forms: |
| |
| ```rust,compile_fail |
| 0b0102; // this is not `0b010` followed by `2` |
| 0o1279; // this is not `0o127` followed by `9` |
| 0x80.0; // this is not `0x80` followed by `.` and `0` |
| 0b101e; // this is not a suffixed literal, or `0b101` followed by `e` |
| 0b; // this is not an integer literal, or `0` followed by `b` |
| 0b_; // this is not an integer literal, or `0` followed by `b_` |
| 2e; // this is not a floating-point literal, or `2` followed by `e` |
| 2.0e; // this is not a floating-point literal, or `2.0` followed by `e` |
| 2em; // this is not a suffixed literal, or `2` followed by `em` |
| 2.0em; // this is not a suffixed literal, or `2.0` followed by `em` |
| ``` |
| |
| r[lex.token.life] |
| ## Lifetimes and loop labels |
| |
| r[lex.token.life.syntax] |
| ```grammar,lexer |
| LIFETIME_TOKEN -> |
| `'` IDENTIFIER_OR_KEYWORD _not immediately followed by `'`_ |
| | `'_` _not immediately followed by `'`_ |
| | RAW_LIFETIME |
| |
| LIFETIME_OR_LABEL -> |
| `'` NON_KEYWORD_IDENTIFIER _not immediately followed by `'`_ |
| | RAW_LIFETIME |
| |
| RAW_LIFETIME -> |
| `'r#` IDENTIFIER_OR_KEYWORD _except `crate`, `self`, `super`, `Self` and not immediately followed by `'`_ |
| |
| RESERVED_RAW_LIFETIME -> `'r#_` _not immediately followed by `'`_ |
| ``` |
| |
| r[lex.token.life.intro] |
| Lifetime parameters and [loop labels] use LIFETIME_OR_LABEL tokens. Any |
| LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in |
| macros. |
| |
| r[lex.token.life.raw.intro] |
| A raw lifetime is like a normal lifetime, but its identifier is prefixed by `r#`. (Note that the `r#` prefix is not included as part of the actual lifetime.) |
| |
| r[lex.token.life.raw.allowed] |
| Unlike a normal lifetime, a raw lifetime may be any strict or reserved keyword except the ones listed above for `RAW_LIFETIME`. |
| |
| r[lex.token.life.raw.reserved] |
| It is an error to use the RESERVED_RAW_LIFETIME token `'r#_` in order to avoid confusion with the [placeholder lifetime]. |
| |
| r[lex.token.life.raw.edition2021] |
| > [!EDITION-2021] |
| > Raw lifetimes are accepted in the 2021 edition or later. In earlier additions the token `'r#lt` is lexed as `'r # lt`. |
| |
| r[lex.token.punct] |
| ## Punctuation |
| |
| r[lex.token.punct.syntax] |
| ```grammar,lexer |
| PUNCTUATION -> |
| `=` |
| | `<` |
| | `<=` |
| | `==` |
| | `!=` |
| | `>=` |
| | `>` |
| | `&&` |
| | `||` |
| | `!` |
| | `~` |
| | `+` |
| | `-` |
| | `*` |
| | `/` |
| | `%` |
| | `^` |
| | `&` |
| | `|` |
| | `<<` |
| | `>>` |
| | `+=` |
| | `-=` |
| | `*=` |
| | `/=` |
| | `%=` |
| | `^=` |
| | `&=` |
| | `|=` |
| | `<<=` |
| | `>>=` |
| | `@` |
| | `.` |
| | `..` |
| | `...` |
| | `..=` |
| | `,` |
| | `;` |
| | `:` |
| | `::` |
| | `->` |
| | `<-` |
| | `=>` |
| | `#` |
| | `$` |
| | `?` |
| | `_` |
| | `{` |
| | `}` |
| | `[` |
| | `]` |
| | `(` |
| | `)` |
| ``` |
| |
| r[lex.token.punct.intro] |
| Punctuation symbol tokens are listed here for completeness. Their individual |
| usages and meanings are defined in the linked pages. |
| |
| | Symbol | Name | Usage | |
| |--------|-------------|-------| |
| | `+` | Plus | [Addition][arith], [Trait Bounds], [Macro Kleene Matcher][macros] |
| | `-` | Minus | [Subtraction][arith], [Negation] |
| | `*` | Star | [Multiplication][arith], [Dereference], [Raw Pointers], [Macro Kleene Matcher][macros], [Use wildcards] |
| | `/` | Slash | [Division][arith] |
| | `%` | Percent | [Remainder][arith] |
| | `^` | Caret | [Bitwise and Logical XOR][arith] |
| | `!` | Not | [Bitwise and Logical NOT][negation], [Macro Calls][macros], [Inner Attributes][attributes], [Never Type], [Negative impls] |
| | `&` | And | [Bitwise and Logical AND][arith], [Borrow], [References], [Reference patterns] |
| | <code>\|</code> | Or | [Bitwise and Logical OR][arith], [Closures], Patterns in [match], [if let], and [while let] |
| | `&&` | AndAnd | [Lazy AND][lazy-bool], [Borrow], [References], [Reference patterns] |
| | <code>\|\|</code> | OrOr | [Lazy OR][lazy-bool], [Closures] |
| | `<<` | Shl | [Shift Left][arith], [Nested Generics][generics] |
| | `>>` | Shr | [Shift Right][arith], [Nested Generics][generics] |
| | `+=` | PlusEq | [Addition assignment][compound] |
| | `-=` | MinusEq | [Subtraction assignment][compound] |
| | `*=` | StarEq | [Multiplication assignment][compound] |
| | `/=` | SlashEq | [Division assignment][compound] |
| | `%=` | PercentEq | [Remainder assignment][compound] |
| | `^=` | CaretEq | [Bitwise XOR assignment][compound] |
| | `&=` | AndEq | [Bitwise And assignment][compound] |
| | <code>\|=</code> | OrEq | [Bitwise Or assignment][compound] |
| | `<<=` | ShlEq | [Shift Left assignment][compound] |
| | `>>=` | ShrEq | [Shift Right assignment][compound], [Nested Generics][generics] |
| | `=` | Eq | [Assignment], [Attributes], Various type definitions |
| | `==` | EqEq | [Equal][comparison] |
| | `!=` | Ne | [Not Equal][comparison] |
| | `>` | Gt | [Greater than][comparison], [Generics], [Paths] |
| | `<` | Lt | [Less than][comparison], [Generics], [Paths] |
| | `>=` | Ge | [Greater than or equal to][comparison], [Generics] |
| | `<=` | Le | [Less than or equal to][comparison] |
| | `@` | At | [Subpattern binding] |
| | `_` | Underscore | [Wildcard patterns], [Inferred types], Unnamed items in [constants], [extern crates], [use declarations], and [destructuring assignment] |
| | `.` | Dot | [Field access][field], [Tuple index] |
| | `..` | DotDot | [Range][range], [Struct expressions], [Patterns], [Range Patterns][rangepat] |
| | `...` | DotDotDot | [Variadic functions][extern], [Range patterns] |
| | `..=` | DotDotEq | [Inclusive Range][range], [Range patterns] |
| | `,` | Comma | Various separators |
| | `;` | Semi | Terminator for various items and statements, [Array types] |
| | `:` | Colon | Various separators |
| | `::` | PathSep | [Path separator][paths] |
| | `->` | RArrow | [Function return type][functions], [Closure return type][closures], [Function pointer type] |
| | `=>` | FatArrow | [Match arms][match], [Macros] |
| | `<-` | LArrow | The left arrow symbol has been unused since before Rust 1.0, but it is still treated as a single token |
| | `#` | Pound | [Attributes] |
| | `$` | Dollar | [Macros] |
| | `?` | Question | [Question mark operator][question], [Questionably sized][sized], [Macro Kleene Matcher][macros] |
| | `~` | Tilde | The tilde operator has been unused since before Rust 1.0, but its token may still be used |
| |
| r[lex.token.delim] |
| ## Delimiters |
| |
| Bracket punctuation is used in various parts of the grammar. An open bracket |
| must always be paired with a close bracket. Brackets and the tokens within |
| them are referred to as "token trees" in [macros]. The three types of brackets are: |
| |
| | Bracket | Type | |
| |---------|-----------------| |
| | `{` `}` | Curly braces | |
| | `[` `]` | Square brackets | |
| | `(` `)` | Parentheses | |
| |
| r[lex.token.reserved] |
| ## Reserved tokens |
| |
| r[lex.token.reserved.intro] |
| Several token forms are reserved for future use. It is an error for the source input to match one of these forms. |
| |
| r[lex.token.reserved.syntax] |
| ```grammar,lexer |
| RESERVED_TOKEN -> |
| RESERVED_GUARDED_STRING_LITERAL |
| | RESERVED_NUMBER |
| | RESERVED_POUNDS |
| | RESERVED_RAW_IDENTIFIER |
| | RESERVED_RAW_LIFETIME |
| | RESERVED_TOKEN_DOUBLE_QUOTE |
| | RESERVED_TOKEN_LIFETIME |
| | RESERVED_TOKEN_POUND |
| | RESERVED_TOKEN_SINGLE_QUOTE |
| ``` |
| |
| r[lex.token.reserved-prefix] |
| ## Reserved prefixes |
| |
| r[lex.token.reserved-prefix.syntax] |
| ```grammar,lexer |
| RESERVED_TOKEN_DOUBLE_QUOTE -> |
| ( IDENTIFIER_OR_KEYWORD _except `b` or `c` or `r` or `br` or `cr`_ | `_` ) `"` |
| |
| RESERVED_TOKEN_SINGLE_QUOTE -> |
| ( IDENTIFIER_OR_KEYWORD _except `b`_ | `_` ) `'` |
| |
| RESERVED_TOKEN_POUND -> |
| ( IDENTIFIER_OR_KEYWORD _except `r` or `br` or `cr`_ | `_` ) `#` |
| |
| RESERVED_TOKEN_LIFETIME -> |
| `'` ( IDENTIFIER_OR_KEYWORD _except `r`_ | `_` ) `#` |
| ``` |
| |
| r[lex.token.reserved-prefix.intro] |
| Some lexical forms known as _reserved prefixes_ are reserved for future use. |
| |
| r[lex.token.reserved-prefix.id] |
| Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or `_`) which is immediately followed by a `#`, `'`, or `"` character (without intervening whitespace) is identified as a reserved prefix. |
| |
| r[lex.token.reserved-prefix.raw-token] |
| Note that raw identifiers, raw string literals, and raw byte string literals may contain a `#` character but are not interpreted as containing a reserved prefix. |
| |
| r[lex.token.reserved-prefix.strings] |
| Similarly the `r`, `b`, `br`, `c`, and `cr` prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes. |
| |
| r[lex.token.reserved-prefix.life] |
| Source input which would otherwise be lexically interpreted as a non-raw lifetime (or a keyword or `_`) which is immediately followed by a `#` character (without intervening whitespace) is identified as a reserved lifetime prefix. |
| |
| r[lex.token.reserved-prefix.edition2021] |
| > [!EDITION-2021] |
| > Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros). |
| > |
| > Before the 2021 edition, reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a `#` token). |
| > |
| > Examples accepted in all editions: |
| > ```rust |
| > macro_rules! lexes {($($_:tt)*) => {}} |
| > lexes!{a #foo} |
| > lexes!{continue 'foo} |
| > lexes!{match "..." {}} |
| > lexes!{r#let#foo} // three tokens: r#let # foo |
| > lexes!{'prefix #lt} |
| > ``` |
| > |
| > Examples accepted before the 2021 edition but rejected later: |
| > ```rust,edition2018 |
| > macro_rules! lexes {($($_:tt)*) => {}} |
| > lexes!{a#foo} |
| > lexes!{continue'foo} |
| > lexes!{match"..." {}} |
| > lexes!{'prefix#lt} |
| > ``` |
| |
| r[lex.token.reserved-guards] |
| ## Reserved guards |
| |
| r[lex.token.reserved-guards.syntax] |
| ```grammar,lexer |
| RESERVED_GUARDED_STRING_LITERAL -> `#`+ STRING_LITERAL |
| |
| RESERVED_POUNDS -> `#`{2..} |
| ``` |
| |
| r[lex.token.reserved-guards.intro] |
| The reserved guards are syntax reserved for future use, and will generate a compile error if used. |
| |
| r[lex.token.reserved-guards.string-literal] |
| The *reserved guarded string literal* is a token of one or more `U+0023` (`#`) immediately followed by a [STRING_LITERAL]. |
| |
| r[lex.token.reserved-guards.pounds] |
| The *reserved pounds* is a token of two or more `U+0023` (`#`). |
| |
| r[lex.token.reserved-guards.edition2024] |
| > [!EDITION-2024] |
| > Before the 2024 edition, reserved guards are accepted by the lexer and interpreted as multiple tokens. For example, the `#"foo"#` form is interpreted as three tokens. `##` is interpreted as two tokens. |
| |
| [Inferred types]: types/inferred.md |
| [Range patterns]: patterns.md#range-patterns |
| [Reference patterns]: patterns.md#reference-patterns |
| [Subpattern binding]: patterns.md#identifier-patterns |
| [Wildcard patterns]: patterns.md#wildcard-pattern |
| [arith]: expressions/operator-expr.md#arithmetic-and-logical-binary-operators |
| [array types]: types/array.md |
| [assignment]: expressions/operator-expr.md#assignment-expressions |
| [attributes]: attributes.md |
| [borrow]: expressions/operator-expr.md#borrow-operators |
| [closures]: expressions/closure-expr.md |
| [comparison]: expressions/operator-expr.md#comparison-operators |
| [compound]: expressions/operator-expr.md#compound-assignment-expressions |
| [constants]: items/constant-items.md |
| [dereference]: expressions/operator-expr.md#the-dereference-operator |
| [destructuring assignment]: expressions/underscore-expr.md |
| [extern crates]: items/extern-crates.md |
| [extern]: items/external-blocks.md |
| [field]: expressions/field-expr.md |
| [Floating-point literal expressions]: expressions/literal-expr.md#floating-point-literal-expressions |
| [floating-point types]: types/numeric.md#floating-point-types |
| [function pointer type]: types/function-pointer.md |
| [functions]: items/functions.md |
| [generics]: items/generics.md |
| [identifier]: identifiers.md |
| [if let]: expressions/if-expr.md#if-let-patterns |
| [Integer literal expressions]: expressions/literal-expr.md#integer-literal-expressions |
| [keywords]: keywords.md |
| [lazy-bool]: expressions/operator-expr.md#lazy-boolean-operators |
| [literal expressions]: expressions/literal-expr.md |
| [loop labels]: expressions/loop-expr.md |
| [macros]: macros-by-example.md |
| [match]: expressions/match-expr.md |
| [negation]: expressions/operator-expr.md#negation-operators |
| [negative impls]: items/implementations.md |
| [never type]: types/never.md |
| [numeric types]: types/numeric.md |
| [paths]: paths.md |
| [patterns]: patterns.md |
| [placeholder lifetime]: lifetime-elision.md |
| [question]: expressions/operator-expr.md#the-question-mark-operator |
| [range]: expressions/range-expr.md |
| [rangepat]: patterns.md#range-patterns |
| [raw pointers]: types/pointer.md#raw-pointers-const-and-mut |
| [references]: types/pointer.md |
| [sized]: trait-bounds.md#sized |
| [String continuation escapes]: expressions/literal-expr.md#string-continuation-escapes |
| [struct expressions]: expressions/struct-expr.md |
| [trait bounds]: trait-bounds.md |
| [tuple index]: expressions/tuple-expr.md#tuple-indexing-expressions |
| [tuple structs]: items/structs.md |
| [tuple variants]: items/enumerations.md |
| [tuples]: types/tuple.md |
| [unary minus operator]: expressions/operator-expr.md#negation-operators |
| [use declarations]: items/use-declarations.md |
| [use wildcards]: items/use-declarations.md |
| [while let]: expressions/loop-expr.md#while-let-patterns |