Grammar Setup

The central component of your compiler is the Analyzer object. This object is responsible to manage the set of documents within the compilation project and their semantic graph. Further details regarding the Analyzer's API will be discussed in subsequent chapters. For now, our focus in this chapter will be on configuring the programming language grammar.

The first generic parameter N of the Analyzer represents the type of the language grammar. Essentially, this parameter denotes the type of the syntax tree node. However, to fully describe the grammar, including semantics, you need to extend this enum type with additional metadata:

  1. Annotate enum variants that serve as the top nodes of the scopes with the #[scope] macro attribute.
  2. Add a semantics field to each parsable (and denoted) enum variant, annotated with #[semantics].
  3. Optionally, you can specify the syntax tree classifier using the #[classifier] macro attribute.

From the Chain Analysis example:

#[derive(Node)]
#[token(ChainToken)]
#[trivia($Whitespace)]
#[classifier(ChainNodeClassifier)] // Nodes classifier (this attribute is optional).
pub enum ChainNode {
    #[root]
    #[rule(block: Block)]
    Root {
        #[node]
        node: NodeRef,
        #[parent]
        parent: NodeRef,
        #[child]
        block: NodeRef,
        
        // Fields annotated with this macro attribute must be present in each
        // variant body, and they must be of type `Semantics`.
        #[semantics] 
        semantics: Semantics<VoidFeature<ChainNode>>,
    },

    #[rule($BraceOpen statements: (Block | Assignment)* $BraceClose)]
    #[scope] // This node is the top node of the scope.
    Block {
        #[node]
        node: NodeRef,
        #[parent]
        parent: NodeRef,
        #[child]
        statements: Vec<NodeRef>,
        #[semantics]
        semantics: Semantics<BlockSemantics>,
    },

    #[rule(key: Key $Assign value: (Ref | Num) $Semicolon)]
    Assignment {
        #[node]
        node: NodeRef,
        #[parent]
        parent: NodeRef,
        #[child]
        key: NodeRef,
        #[child]
        value: NodeRef,
        #[semantics]
        semantics: Semantics<VoidFeature<ChainNode>>,
    },
    
    // ...
}

Semantics Field

Each variant in the Node enum must contain a semantics field annotated with the #[semantics] attribute and of type Semantics.

This field will be automatically initialized1 and managed by the macro-generated code.

Through this field, you can access semantic graph attributes that describe the semantics specific to each node.

The Semantic object is parameterized by a user-defined type, typically a struct type, enumerating all semantic attributes logically associated with the node. In the example above, the Semantics of the ChainNode::Block variant is parameterized by the BlockSemantics type.

If a node variant doesn't have any attributes, you can parameterize its Semantics object with the VoidFeature type, as seen in the Root and Assignment node variants.

1

To initialize this field manually in the hand-written parser, use the Semantics::new function, passing the current NodeRef obtained from the SyntaxSession::node_ref function.

Feature Objects

The type you use as a parameter of the Semantics object is called a feature.

Typically, the semantic feature is a user-defined struct type derived from the Feature trait using the Feature derive macro. This structure consists of fields that are either attributes or other feature objects.

#[derive(Feature)]
#[node(ChainNode)] // Required by the macro trait.
pub struct BlockSemantics {
    #[scoped]
    pub analysis: Attr<BlockAnalysis>,
    pub assignments: Attr<Shared<BlockAssignmentMap>>,
    pub blocks: Attr<Shared<BlockNamespaceMap>>,
    pub namespace: Attr<Shared<BlockNamespace>>,
}

In the above code, all fields are semantic attributes (Attr types), but you are free to use other features as field types whenever you want to create more complex nested structures. You can also reuse the same feature type and attribute types in multiple places, as long as the feature or attribute logically belongs to different syntax tree nodes. The Analyzer will treat them as independent instances.

Additionally, in the above code, we annotated the analysis field as #[scoped]. This annotation informs the Analyzer that this specific attribute (or feature) is an entry point of the semantic model, performing the initial inspection and mapping of the syntax tree's scoped branch to the semantic model's initial objects.

Features with scoped attributes should be used as semantic objects of scoped nodes (BlockSemantics is the semantics of the ChainNode::Block, which is a #[scope]).

Attributes

We will discuss attributes in more detail in the next chapters, but to give you a brief overview, the generic parameter of Attr specifies the type of the attribute value. This value is part of the semantic model and can be any user-defined type (e.g., a struct or an enum) equipped with a function that computes this value based on the syntax tree values and other attribute values.

#[derive(Default, Clone, PartialEq, Eq)]
pub struct BlockAnalysis {
    pub assignments: Shared<BlockAssignmentMap>,
    pub blocks: Shared<BlockNamespaceMap>,
}

impl Computable for BlockAnalysis {
    type Node = ChainNode;

    fn compute<H: TaskHandle, S: SyncBuildHasher>(
        context: &mut AttrContext<Self::Node, H, S>,
    ) -> AnalysisResult<Self> {
        // Computing the BlockAnalysis instances based on the inputs provided
        // by the `context`.
    }
}

The general requirements imposed on this type are that it should implement the Clone, Eq, and Computable traits.