blob: b5be7aa144a2508d44af3f23b7ed56328f470cb2 [file] [log] [blame] [view]
# Rust grammar
The Reference grammar is written in Markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The [`mdbook-spec`] extension parses these rules and converts them into a renderable format, including railroad diagrams.
The code block should have a lang string with the word `grammar`, a comma, and the category of the grammar, like this:
~~~
```grammar,items
ProductionName -> SomeExpression
```
~~~
The category is used to group similar productions on the grammar summary page in the appendix.
## Grammar syntax
The syntax for the grammar itself is similar to what is described in **[Notation]**, though there are some rendering differences.
A "root" production, marked with `@root`, is one that is not used in any other production.
The syntax for the grammar notation, described here using its own notation, is:
```
Grammar -> Production+
BACKTICK -> U+0060
LF -> U+000A
Production ->
( Comment LF )*
`@root`? Name ` ->` Expression
Name -> <Alphanumeric or `_`>+
Expression -> Sequence (` `* `|` ` `* Sequence)*
Sequence ->
(` `* AdornedExpr)* ` `* Cut
| (` `* AdornedExpr)+
AdornedExpr -> Prefix? Expr1 Quantifier? Suffix? Footnote?
Prefix -> NegativeLookahead
NegativeLookahead -> `!`
Suffix -> ` _` <not underscore, unless in backtick>* `_`
Footnote -> `[^` ~[`]` LF]+ `]`
Quantifier ->
Optional
| Repeat
| RepeatNonGreedy
| RepeatPlus
| RepeatPlusNonGreedy
| RepeatRange
| RepeatRangeInclusive
Optional -> `?`
Repeat -> `*`
RepeatNonGreedy -> `*?`
RepeatPlus -> `+`
RepeatPlusNonGreedy -> `+?`
RepeatRange -> `{` Range? `..` Range? `}`
RepeatRangeInclusive -> `{` Range? `..=` Range `}`
Range -> [0-9]+
Expr1 ->
Unicode
| NonTerminal
| Break
| Comment
| Terminal
| Charset
| Prose
| Group
| NegativeExpression
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..=4
NonTerminal -> Name
Break -> LF ` `+
Comment -> `//` ~[LF]+
Terminal -> BACKTICK ~[LF]+ BACKTICK
Charset -> `[` (` `* Characters)+ ` `* `]`
Characters ->
CharacterRange
| CharacterTerminal
| CharacterName
CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK
CharacterTerminal -> Terminal
CharacterName -> Name
Prose -> `<` ~[`>` LF]+ `>`
Group -> `(` ` `* Expression ` `* `)`
NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )
Cut -> `^` Sequence
```
The general format is a series of productions separated by blank lines. The expressions are as follows:
| Expression | Example | Description |
|------------|---------|-------------|
| Unicode | U+0060 | A single Unicode character. |
| NonTerminal | FunctionParameters | A reference to another production by name. |
| Break | | Used internally by the renderer to detect line breaks and indentation. |
| Comment | // Single line comment. | A comment extending to the end of the line. |
| Terminal | \`example\` | A sequence of exact characters, surrounded by backticks. |
| Charset | \[ \`A\`-\`Z\` \`0\`-\`9\` \`_\` \] | A choice from a set of characters, space-separated. There are three different forms. |
| CharacterRange | \[ \`A\`-\`Z\` \] | A range of characters; each character should be in backticks. |
| CharacterTerminal | \[ \`x\` \] | A single character, surrounded by backticks. |
| CharacterName | \[ LF \] | A nonterminal, referring to another production. |
| Prose | \<any ASCII character except CR\> | An English description of what should be matched, surrounded in angle brackets. |
| Group | (\`,\` Parameter)+ | Groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions. |
| NegativeExpression | ~\[\` \` LF\] | Matches anything except the given Charset, Terminal, or Nonterminal. |
| Cut | Expr1 ^ Expr2 \| Expr3 | The hard cut operator. Once the expressions preceding `^` in the sequence match, the rest of the sequence must match or parsing fails unconditionally --- no enclosing expression can backtrack past the cut point. |
| Sequence | \`fn\` Name Parameters | A sequence of expressions that must match in order. |
| Alternation | Expr1 \| Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
| Suffix | \_except \[LazyBooleanExpression\]\_ | Adds a suffix to the previous expression to provide an additional English description, rendered in subscript. This can contain limited Markdown, but try to avoid anything except basics like links. |
| Footnote | \[^extern-safe\] | Adds a footnote, which can supply extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal Markdown footnote. |
| Optional | Expr? | The preceding expression is optional. |
| NegativeLookahead | !Expr | Matches if Expr does not follow, without consuming any input. |
| Repeat | Expr* | The preceding expression is repeated 0 or more times. |
| RepeatNonGreedy | Expr*? | The preceding expression is repeated 0 or more times without being greedy. |
| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. |
| RepeatPlusNonGreedy | Expr+? | The preceding expression is repeated 1 or more times without being greedy. |
| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bound can be excluded, which works just like Rust ranges. |
| RepeatRangeInclusive | Expr{2..=4} | The preceding expression is repeated between the inclusive range of times specified. The lower bound can be omitted. |
## Automatic linking
The [`mdbook-spec`] plugin automatically adds Markdown link definitions for all production names on every page. To link directly to a production name, simply surround it in square brackets, like `[ArrayExpression]`.
In some cases, there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the `grammar-` prefix, such as `[Type][grammar-Type]`. The prefix can also be used when explicitness would aid clarity.
[`mdbook-spec`]: tooling/mdbook-spec.md
[Notation]: https://doc.rust-lang.org/nightly/reference/notation.html