The development of the Nitra parser begins with creating a syntax module. The syntax module is a translation and encapsulation unit. The complete parser is composed of one or more syntax modules.
The string calculator example
In Nitra, creating a new language or an extension to an existing one starts with declaring a new syntax module (see
The module is declared in the
.nitra file (see below). If you use VS, you can take advantage of the syntax module template or the Nitra project template that can be found in the
Nemerle folder of the project creation dialog.
Let’s start with declaring an empty syntax module named
The string calculator parses expressions that consist of the arithmetic operators (
+, -, *, /, (), ^) and numbers. Nitra makes it possible to describe the operator parsing with a single rule. In addition to that rule you need to define the starting rule (with which the parsing starts) and the number parsing rules.
Here is what the Nitra string calculator grammar looks like:
The construct (which is similar to C#):
opens up the
Whitespaces syntax module from the
Nitra.Core.dll standard library, allowing you to call the members of
Whitespaces syntax module declared in it by non-qualified names.
The first one is the starting rule:
Since it is marked with the
StartRule attribute, a special function that simplifies the beginning of the parsing from this rule is generated for it.
The following construct:
Expr extensible rule (see
ExtensibleRule) which describes an arithmetic expression. The arithmetic expression consists of the binary operators
+, -, *, /, ^ , the unary operator
-, and parentheses. The operators have precedence and associativities. For example, the value of the expression
2 + 3 * 4 is 14, as the
* operator has higher precedence than
Number parsing is described with the following rules:
Regex rules are handy for token content parsing. They are fast and light on memory, as they are converted to DFA and do not produce AST.
Digit rule parses the digits:
0, 1, 2, 3, 4, 5, 6, 7, 8 and
9, whereas the
Number rule parses a digit with an optional value after the point, e.g.
Nitra allows you to express operator precedence and associativity directly (as compared to emulating them using a recursive rule set). Besides, Nitra lets you express binary operators through the left recursion, which is more natural for the human brain to perceive.
Nitra comes with a special support for describing operators. All the operators that are going to be used jointly should be described as extensions (see
ExtensionRule) of a single extensible rule (see
ExtensibleRule). To describe a binary (or a higher arity) infix operator, you have to describe a rule in which the operator (i.e. some literal or another rule describing the operator) is located between two recursive calls to a current extensible rule. For example, the
+ operator requires you to describe the following extension rule:
where the operator’s precedence is specified by the «precedence <name>» construct. The higher its value – the higher the precedence. In the future we’ll replace constants with declaratively specified precedence levels which can also be dynamically extended. Should you need a right-associative operator, add the
right-associative construct to the precedence declaration:
Associativity determines the way a number of sequentially located operators will be treated. For example, for the left-associative «/» operator, the following expression:
will be treated as:
If the operator were right-associative, the expression would be treated as:
The above-cited grammar lets you parse the expression. You can load the resulting parser to
Nitra.Visualizer.exe or use it from your application. However, it’s not very useful after all. Let’s make our parser calculate the values of the parsed expressions.
This can be accomplished in different ways. You could get AST and analyze it in the C# application code. However Nitra has a build-in solution for such tasks – methods declared directly on AST.
Here is an example of the calculator grammar extended with the AST methods:
As you can see, the methods are very much like class methods, except that they are declared directly in rules not in classes. To learn more about these methods, see
describes the Value method which has no parameters and returns a value of "double" type (type
System.Double from .Net). The method’s body consists of a single expression:
Expr is a reference to the AST field created based on the declaration to the
Expr rule in the rule’s body. Since the
Expr rule also has the
Value() method declared, it can be called. Its calculation result will be the calculation result of the
Value method in the
Nitra AST methods utilize Nemerle syntax. Being similar to C#, it has some differences. Here you can learn more about it. Describing AST methods in C# is planned for the future. To understand the examples cited here, just keep in mind that the method do not require the return keyword. The method return value is the result of the last (or the only one) expression. Besides that, the method can be written either in short (without curly brackets):
or in full form:
In the first case,
= may be followed by only one expression. In the latter – there may be several expressions divided by a semicolon.
declared in the
Expr extensible rule body describes the abstract method. Abstract and virtual methods can be declared only in the extensible rules. A method without a body (such as in our example) is automatically considered an abstract method. Should it have a body specified – it becomes virtual. An abstract method must be overridden in the extension rules:
You don’t have to describe the method signature when overriding. Since the Nitra AST methods do not support overloading, Nitra can find the signature description in the extensible rule. The
override keyword is required though. It prevents you from accidentally blocking the method and eliminates ambiguity between the method definition and the nested rule.
In this example, first the
GetText method is called. It receives the
Number field name as a parameter. This field was formed because the rule body contained a call to the Number regex rule. The type of this field is
NSpan. It describes the span of the text. In this case, the text corresponds to the parsed numeric value. The
GetText method is declared in one of the AST base classes (see
Located). It returns the text that corresponds to the passed
NSpan. The received text is passed to the
double.Parse .Net function which converts it to double. Thus, this method transforms the parsed number into double.
The override of the Value method declared in the rule which parses the
gets the values of its both subexpressions and returns their sum.
Since the body of this rule contains two calls to the
Expr rule, the field names being formed for them receive an integer index (these fields names can be specified explicitly as well; see
The overrides for the other rules are described in the same manner. The only difference is in the performed calculations, therefore there is no reason in describing them.
The only thing left to be mentioned is the following construct:
This construct specifies the value which is used if the source code (in this case the expression code) contains an error that caused the parser to create AST that doesn’t have a node corresponding to the subexpression. If you write the expression
2 + , for instance.
After you add the AST methods, you can put the calculator to work. Here is what using the calculator from a C# application looks like:
Please note that AST is built for the expression even if an error occurred. To check for parsing errors, you can check the
parseResult.IsSuccess property value or check that the list returned by the
parseResult.GetErrors() method is empty.
Now you can compile, run, and test the calculator:
Here is an example of extending the rule from an imported syntax module. You can extend extensible rules declared either in your project or in external assemblies. Below is the example from JsonParser, which adds C-style comments to the standard whitespace rule declared in
extend token construct lets you extend (add alternative extensions to) a rule declared in another module. This way the ignored whitespace characters will be complemented with C/C++ comments. The rules that describe comments are stored is another standard library module –