The Reference grammar is written in Markdown code blocks using a modified BNF-like syntax (with a blend of regex and other arbitrary things). The mdbook-spec extension parses these rules and converts them into a renderable format, including railroad diagrams.
The code block should have a lang string with the word grammar, a comma, and the category of the grammar, like this:
```grammar,items ProductionName -> SomeExpression ```
The category is used to group similar productions on the grammar summary page in the appendix.
The syntax for the grammar itself is similar to what is described in Notation, though there are some rendering differences.
A “root” production, marked with @root, is one that is not used in any other production.
The syntax for the grammar notation, described here using its own notation, is:
Grammar -> Production+
BACKTICK -> U+0060
LF -> U+000A
Production ->
( Comment LF )*
`@root`? Name ` ->` Expression
Name -> <Alphanumeric or `_`>+
Expression -> Sequence (` `* `|` ` `* Sequence)*
Sequence ->
(` `* AdornedExpr)* ` `* Cut
| (` `* AdornedExpr)+
AdornedExpr -> Prefix? Expr1 Quantifier? Suffix? Footnote?
Prefix -> NegativeLookahead
NegativeLookahead -> `!`
Suffix -> ` _` <not underscore, unless in backtick>* `_`
Footnote -> `[^` ~[`]` LF]+ `]`
Quantifier ->
Optional
| Repeat
| RepeatNonGreedy
| RepeatPlus
| RepeatPlusNonGreedy
| RepeatRange
| RepeatRangeInclusive
Optional -> `?`
Repeat -> `*`
RepeatNonGreedy -> `*?`
RepeatPlus -> `+`
RepeatPlusNonGreedy -> `+?`
RepeatRange -> `{` Range? `..` Range? `}`
RepeatRangeInclusive -> `{` Range? `..=` Range `}`
Range -> [0-9]+
Expr1 ->
Unicode
| NonTerminal
| Break
| Comment
| Terminal
| Charset
| Prose
| Group
| NegativeExpression
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..=4
NonTerminal -> Name
Break -> LF ` `+
Comment -> `//` ~[LF]+
Terminal -> BACKTICK ~[LF]+ BACKTICK
Charset -> `[` (` `* Characters)+ ` `* `]`
Characters ->
CharacterRange
| CharacterTerminal
| CharacterName
CharacterRange -> BACKTICK <any char> BACKTICK `-` BACKTICK <any char> BACKTICK
CharacterTerminal -> Terminal
CharacterName -> Name
Prose -> `<` ~[`>` LF]+ `>`
Group -> `(` ` `* Expression ` `* `)`
NegativeExpression -> `~` ( Charset | Terminal | NonTerminal )
Cut -> `^` Sequence
The general format is a series of productions separated by blank lines. The expressions are as follows:
| Expression | Example | Description |
|---|---|---|
| Unicode | U+0060 | A single Unicode character. |
| NonTerminal | FunctionParameters | A reference to another production by name. |
| Break | Used internally by the renderer to detect line breaks and indentation. | |
| Comment | // Single line comment. | A comment extending to the end of the line. |
| Terminal | `example` | A sequence of exact characters, surrounded by backticks. |
| Charset | [ `A`-`Z` `0`-`9` `_` ] | A choice from a set of characters, space-separated. There are three different forms. |
| CharacterRange | [ `A`-`Z` ] | A range of characters; each character should be in backticks. |
| CharacterTerminal | [ `x` ] | A single character, surrounded by backticks. |
| CharacterName | [ LF ] | A nonterminal, referring to another production. |
| Prose | <any ASCII character except CR> | An English description of what should be matched, surrounded in angle brackets. |
| Group | (`,` Parameter)+ | Groups an expression for the purpose of precedence, such as applying a repetition operator to a sequence of other expressions. |
| NegativeExpression | ~[` ` LF] | Matches anything except the given Charset, Terminal, or Nonterminal. |
| Cut | Expr1 ^ Expr2 | Expr3 | The hard cut operator. Once the expressions preceding ^ in the sequence match, the rest of the sequence must match or parsing fails unconditionally --- no enclosing expression can backtrack past the cut point. |
| Sequence | `fn` Name Parameters | A sequence of expressions that must match in order. |
| Alternation | Expr1 | Expr2 | Matches only one of the given expressions, separated by the vertical pipe character. |
| Suffix | _except [LazyBooleanExpression]_ | Adds a suffix to the previous expression to provide an additional English description, rendered in subscript. This can contain limited Markdown, but try to avoid anything except basics like links. |
| Footnote | [^extern-safe] | Adds a footnote, which can supply extra information that may be helpful to the user. The footnote itself should be defined outside of the code block like a normal Markdown footnote. |
| Optional | Expr? | The preceding expression is optional. |
| NegativeLookahead | !Expr | Matches if Expr does not follow, without consuming any input. |
| Repeat | Expr* | The preceding expression is repeated 0 or more times. |
| RepeatNonGreedy | Expr*? | The preceding expression is repeated 0 or more times without being greedy. |
| RepeatPlus | Expr+ | The preceding expression is repeated 1 or more times. |
| RepeatPlusNonGreedy | Expr+? | The preceding expression is repeated 1 or more times without being greedy. |
| RepeatRange | Expr{2..4} | The preceding expression is repeated between the range of times specified. Either bound can be excluded, which works just like Rust ranges. |
| RepeatRangeInclusive | Expr{2..=4} | The preceding expression is repeated between the inclusive range of times specified. The lower bound can be omitted. |
The mdbook-spec plugin automatically adds Markdown link definitions for all production names on every page. To link directly to a production name, simply surround it in square brackets, like [ArrayExpression].
In some cases, there might be name collisions with the automatic linking of rule names. In that case, disambiguate with the grammar- prefix, such as [Type][grammar-Type]. The prefix can also be used when explicitness would aid clarity.