Irony - Language Implementation Kit/Introduction

Basic Irony Tutorial


In this tutorial I will be going through the steps to create a basic language called “GridWorld” Syntax. Keep in mind: First, this tutorial was not intended to provide exhaustive knowledge of the Irony tool, and second, that use of the sample projects in the Irony download is strongly recommended.

 

Chapter 1 -- GridWorld

I. Introduction to GridWorld

We will start off by creating a simple language that describes a grid of certain height and width, starts at some location within the grid, and move around within the grid. So here is some possible source code for the GridWorld language:

 

Create a 10 by 10 grid.

Start at location 1,1.                                                      (1)

Move down 3.                                                                        (2)

Move right 3.                                                                         (3)

Move up 1.                                                                        (4)

 

And here is the result, assuming (0,0) is of the form (row, column) and is the top left most square:

 


0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

0

0

0

0

0

2

0

0

0

0

0

0

0

0

0

2

0

0

4

0

0

0

0

0

0

2

3

3

3

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

 

You might have already reasoned that it wouldn’t be too difficult to tackle this problem without use of any parsing tool at all. The first two lines of our language will always be the same in every program that we make, with different numbers in place of “10 by 10” and “1,1”, and the next few lines are quite repetitive. Perhaps the program would become more complex if the coder was assigned to validate the syntax of this toy language. Even still, in this case it would be a relatively easy task.

 

However, consider writing the same program for a much more complex language such as Python or C#--a primitive approach from scratch, though not impossible, would take a lot of time. Irony offers a robust method to parse language structure. This GridWorld toy problem is a simple basis on which to learn how to tackle much more complex grammars in the future.

 

II. Creating a GridWorld Grammar class

The first step in attacking this problem with Irony is to create a Grammar class. This class will act like a generalizing schema to parse source code. First, we’ll need to get the Irony library.

 

1.     Download Irony. Go to https://github.com/IronyProject/Irony and get the latest version.

2.     Open the main project and build it in Visual Studio.

3.     This should generate Irony.dll in “irony-XXXX/Irony/bin/Debug”

Add Irony.dll to your project.

 

Now, create a class that extends Irony.Grammar

 

using Irony.Parsing;

 

public class GridWorldGrammar : Grammar

{

public GridWorldGrammar() {}

}

 

One option we have at this point is to use case sensitivity. The Grammar class has a constructor that takes a bool to define this characteristic, so we could have used the following to support phrases such as “moVE RigHT 1” (by default, the grammar is case sensitive):

 

public GridWorldGrammar() : base(false) {}

 

In this class, we define the rules of the GridWorld language in BNF. Our language structure has 3 parts to it: the create statement, the start statement, and one or more move statements. We’ll call the entity that contains all three of these parts the program. The createStatement is defined with the words “Create” and “a” followed by a number, “by”, another number, “grid” and a period. A moveStatement contains a direction and a number. Everything else in our language is defined in a similar fashion. In BNF, this is…

 

program   :==   createStatement   startStatement   moveStatements

createStatement   :==   “Create”   “a”   number   “by”   number   “grid”   “.”

startStatement   :==   “Start”   “at”   “location”   number   “,”   number   “.”

moveStatements   :==   moveStatement +

moveStatement   :==   “Move”   direction   number   “.”

direction   :==   “up”   |   “down”   |   “right”   |   “left”

 

BNF, or Backus-Naur Form, is an important notation that Irony uses to understand grammar definitions. If you are not familiar with it, no sweat, it’s easy to pick up. There are plenty of quick online tutorials, and Google is your friend.

 

Luckily, there is almost a direct syntactic translation of this BNF description into C# through Irony’s use of overriding operators. First, we must define our Terminals and NonTerminals. Terminals are often predefined constructs, like a number or string. Here, our only Terminal is a number, which we’ll define in terms of a regular expression as “a series of one or more digits”:

 

RegexBasedTerminal number = new RegexBasedTerminal("number", "[0-9]+");

 

Everything else in our language is defined as a NonTerminal like so:

 

NonTerminal program = new NonTerminal("program"),

createStatement = new NonTerminal("createStatement"),

startStatement = new NonTerminal("startStatement"),

moveStatements = new NonTerminal("moveStatements"),

moveStatement = new NonTerminal("moveStatement"),

direction = new NonTerminal("direction");

 

Each string in the NonTerminal constructor will be important later on when we navigate a parsed tree. Now we can translate the BNF statements into something Irony can understand. The following is the real meat of our language.

 

program.Rule = createStatement + startStatement + moveStatements;

 

This should look familiar; this was the first statement in our BNF rules.

Saying this+that intuitively means “this before that.” Similarly, this|that means “this or that” as in BNF.

 

createStatement.Rule = ToTerm("Create") + "a" + number + "by" + number

+ "grid" + ".";

startStatement.Rule = ToTerm("Start") + "at" + "location" + number + ","

+ number + ".";

 

Anytime a string literal is the first element used in a rule statement, as in both rules above, it’s good practice to put a ToTerm() function surrounding the string. It’s not required, but erroneous parsing will occur if ToTerm is not used like this.

 

moveStatements.Rule = MakePlusRule(moveStatements, moveStatement);

moveStatement.Rule = ToTerm("Move") + direction + number + ".";

direction.Rule = ToTerm("up") | "down" | "right" | "left";

 

Unfortunately, there is not a very easy way to describe the “one or more” and “zero or more” rules. These are called “plus” and “star” rules, respectively, which are drawn from a similar concept in Regex expressions. Irony uses the MakePlusRule and MakeStarRule function to define them. Both are used the same way syntactically, and MakePlusRule can be seen above in the moveStatements rule.


One more statement is required to finish the grammar:

 

this.Root = program;

 

And another statement that will help keep the tree “clean”:

 

MarkPunctuation("Create", "a", "grid", "by", "Start", "at",

"location", ",", ".", "Move");

 

We’ll discuss this last statement in a little more depth in section IV. That’s it! We have adequately defined a GridWorld grammar to parse a GridWorld program. See the appendix for the complete GridWorldGrammar class.

 

III. Parsing source code & validating syntax

Let’s say we have a GUI that has a Textbox, a Button, and a Label. Our Textbox will allow the user to type or paste source code that is in GridWorld syntax. Once a user places their code there, they press the Button, and the Label will state whether or not the syntax is of the right form.

 

With Irony, this is a very simple operation. Here is a function that returns whether or not some given source code with a given grammar object is valid:

 

public bool isValid(string sourceCode, Grammar grammar)

{

      LanguageData language = new LanguageData(grammar);

      Parser parser = new Parser(language);

      ParseTree parseTree = parser.Parse(sourceCode);

      ParseTreeNode root = parseTree.Root;

      return root != null;

}

 

The method this function uses is simple: attempt to parse the given source code with the given grammar’s rules. If this attempt fails, then the source code is not valid. Otherwise, it is valid. In this case we know the parsing failed when the root of the parsed tree doesn’t exist (is of null value).

 

Obviously, there are different ways to implement this code just as there are different ways to describe our GridWorld language to Irony in the previous section. It is strongly suggested to explore other possibilities, increase understanding and knowledge of how Irony works, and thus reap the full benefits of the Irony tool.

 

IV. Navigating the language tree & validating content

We’ve seen how Irony offers an easy and fast way to validate syntax. Now we’re going to take a look at how to examine the content of a parsed language tree. The following code is a function that returns the root of a parsed language tree. As you can see, it looks very similar to the isValid function above.

 

public ParseTreeNode getRoot(string sourceCode, Grammar grammar)

{

      LanguageData language = new LanguageData(grammar);

      Parser parser = new Parser(language);

      ParseTree parseTree = parser.Parse(sourceCode);

      ParseTreeNode root = parseTree.Root;

      return root;

}

 

By using this function, we can explore and display source code in a robust manner. We’ll use the following function to display the tree in the DOS screen:

 

public void dispTree(ParseTreeNode node, int level)

{

for(int i = 0; i < level; i++)

Console.Write("  ");

Console.WriteLine(node);

 

foreach (ParseTreeNode child in node.ChildNodes)

            dispTree(child, level + 1);

}

 

Now, let’s parse some code and compare outputs. Recall this statement we used at the very end of the grammar in section III:

 

MarkPunctuation("Create", "a", "grid", "by", "Start", "at",

"location", ",", ".", "Move");

 

If we had not included this statement, the parsed tree from the source code in Figure 1 would look like that of Figure 2. However, since we added the statement to “help keep the tree clean,” the parsed tree looks like Figure 3. As you can see, MarkPunctuation is a very handy function.

 


Create a 10 by 10 grid.

Start at location 1,1.

Move down 3.

program

  createStatement

    Create (Keyword)

    a (Keyword)

    10 (number)

    by (Keyword)

    10 (number)

    grid (Keyword)

    . (Key symbol)

  startStatement

    Start (Keyword)

    at (Keyword)

    location (Keyword)

    1 (number)

    , (Key symbol)

    1 (number)

    . (Key symbol)

  moveStatements

   moveStatement

     Move (Keyword)

     direction

        down (Keyword)

     3 (number)

      . (Key symbol)

program

  createStatement

    10 (number)

    10 (number)

  startStatement

    1 (number)

    1 (number)

  moveStatements

   moveStatement

     direction

        down (Keyword)

     3 (number)

Figure 1                                                                        Figure 2                                                                                          Figure 3

 

Note: the names in this tree do not derive from the names of the Terminal and NonTerminal variables we used when defining the grammar, but rather from the string argument sent to the constructor of each variable when they were each declared.

 

There are a number of ways to now go about reading the tree and producing some output. Note that this means your language is an interpreted language, not a compiled language; you must write the interpreter. I cooked up a function that uses the tree format in Figure 3 to create a grid similar to the one in section 1. See it in the appendix or in the example code included with this tutorial.

 

Checking content, as well as validating syntax, is an important part of validating a language source code. This is because being able to determine which parts of code are incorrect or correct allows the developer to communicate to the user, often through syntax highlighting.

 

For example, the function I wrote to display a grid from the language tree here can be easily broken if the user writes GridWorld code that goes off the grid, or tells the interpreter to start at a location not on the grid. These mistakes certainly wouldn’t be caught by the syntax validator function seen above; they are practically run-time errors. The developer could decide to check for these sort of errors, and then highlight the offending numbers in red. Thus, a user’s confusion and frustration can easily be avoided by making use of a content-checker.

 

That’s all there is to making a simple domain-specific language parser with Irony!  Please see Code Appendix One or the included source code to view what was discussed in the last few sections.

 

The next section will deal with using some of Irony’s more advanced tools, and then I will go into how to maintain a slightly more complex language called “Manchester Syntax” that I created during the summer of 2010.

 

V. Irony’s tools

See the README.txt file in the irony-XXXXX folder that you downloaded from http://irony.codeplex.com/ for information on how to use Irony’s grammar explorer.