Overriding a Parser
To recap,
the Node derive macro
automatically implements parse procedures for each enum variant annotated with
the #[rule(...)]
macro attribute. Inside the rule, you write a regex-like
parse expression in terms of the LL(1) grammars used by the macro to generate
the parse procedure. This determines the leftmost set of tokens from which
the procedure starts parsing. The leftmost set is used when you descend into
this variant in another variant's rule.
There is a possibility to replace the generated parse procedure with a manually
written Rust function using the #[parser(...)]
macro attribute.
This attribute accepts a Rust expression that must return an instance of the
enum that represents the parsing result product. As an input, you would use
the session
variable, which is a mutable reference to
the SyntaxSession
that represents the current state of the parsing environment.
Usually, inside this expression, you would call your parsing function passing
the session
variable as an argument.
From the Expr Parser example:
#[derive(Node)]
#[token(BoolToken)]
#[trivia($Whitespace)]
pub enum BoolNode {
#[root]
#[rule(expr: Expr)]
Root {
#[node]
node: NodeRef,
#[parent]
parent: NodeRef,
#[child]
expr: NodeRef,
},
#[rule($ParenOpen | $True | $False)] // Leftmost set.
#[denote(EXPR)]
#[describe("expression", "<expr>")]
#[parser(parse_expr(session))] // Overridden parser.
Expr {
#[node]
node: NodeRef,
#[parent]
parent: NodeRef,
#[child]
content: NodeRef,
},
//...
#[denote(AND)]
#[describe("operator", "<and op>")]
And {
#[node]
node: NodeRef,
#[parent]
parent: NodeRef,
#[child]
left: NodeRef,
#[child]
right: NodeRef,
},
//...
}
Leftmost Set is Required
Note that even though we are overriding the parse procedure for the
BoolNode::Expr enum variant via the #[parser(parse_expr(session))]
macro
attribute, we still have to specify the #[rule($ParenOpen | $True | $False)]
attribute too.
The macro requires this attribute because it needs to know the leftmost set of
the parser. Specifically, when we refer to the Expr variant inside the
Root' s #[rule(expr: Expr)]
parse expression, the macro knows that the Expr
parser would start parsing from the "ParenOpen", "True", or "False" tokens as
described in its rule.
Certainly, you don't need to reimplement the entire grammar of the overridden
parse function inside the #[rule(...)]
attribute (the macro will ignore it
anyway). Instead, it would be enough just to enumerate the leftmost tokens via
the |
choice operator.
Variants Denotation
Another thing to notice in this snippet is that the BoolNode::And variant does
not have a rule attribute, but instead, it has a pair of #[denote(AND)]
and #[describe("operator", "<and op>")]
macro attributes.
We don't specify the "rule" attribute here because we are going to parse this variant manually inside the "parse_expr" function too.
The denote attribute informs the macro that this variant is subject to parsing (even if it does not have an explicitly expressed grammar rule) and therefore is a legitimate part of the syntax tree.
The macro allows us to specify the #[child]
, #[parent]
, and other similar
fields in the denoted variants, assuming that their values will be assigned
manually. But more importantly, the macro reserves a parse rule number for the
denoted variant that we will use inside the manually written parser to address
this variant. The number can be accessed through the type's constant with the
name that we specify inside the attribute (BoolNode::AND
in this case).
If the variant is denoted but does not have a rule, the macro additionally
requires specifying the describe attribute, which provides the end-user
facing description of this syntax tree node variant. The first parameter is a
string that describes the general class of this node variant ("operator"
), and
the second one is a more specific description of this particular
variant ("<and op>"
). Lady Deirdre will use this metadata to format error
messages for the syntax errors.
Finally, the variants with the rule attribute are assumed to be denoted implicitly. We don't need to denote them manually, but as a rule of thumb, it is recommended denoting and describing all enum variants regardless.