3 Tutorial

This short tutorial presents how to make a simple calculator. The calculator will compute basic mathematical expressions (+, -, *, /) possibly nested in parenthesis. We assume the reader is familiar with regular expressions.

3.2 Defining the grammar

Expressions are defined with a grammar. For example an expression is a term, a term is a sum of factors and a factor is a product of atomic expressions. An atomic expression is either a number or a complete expression in parenthesis.

We describe such grammars with rules. A rule describe the composition of an item of the language. In our grammar we have 3 items (Term, Factor, Atom). We will call these items ‘symbols’ or ‘non terminal symbols’. The decomposition of a symbol is symbolized with -->

. The grammar of this tutorial is given in figure 3.1.

Figure 3.1:

Grammar for expressions


Grammar rule	Description


Term Factor ((' + '\|'-') Factor)*	A term is a factor eventually followed with a plus ('+') or a minus ('-') sign and an other factor any number of times (* is a repetition of an expression 0 or more times).

Factor Atom ((''\|'/') Atom)	A factor is an atom eventually followed with a '*' or '/' sign and an other atom any number of times.

Atom number \| '(' Term ')'	An atomic expression is either a number or a term in parenthesis.

We have defined here the grammar rules (i.e. the sentences of the language). We now need to describe the lexical items (i.e. the words of the language). These words - also called terminal symbols - are described using regular expressions. In the rules we have written some of these terminal symbols (+,-,*,/,(,)). We have to define number. For sake of simplicity numbers are integers composed of digits (the corresponding regular expression can be [0 - 9]+). To simplify the grammar and then the Python script we define two terminal symbols to group the operators (additive and multiplicative operators). We can also define a special symbol that is ignored by TPG. This symbol is used as a separator. This is generaly usefull for white spaces and comments. The terminal symbols are given in figure 3.2

Figure 3.2:

Terminal symbol definition for expressions


Terminal symbol	Regular expression	Comment


number	[0 - 9]+ or \d+	One or more digits

add	[+-]	a + or a -

mul	[*/]	a * or a /

spaces	\s+	One or more spaces

This is sufficient to define our parser with TPG. The grammar of the expressions in TPG can be found in figure 3.3.

Figure 3.3:

Grammar of the expression recognizer

parser Calc:

    separator spaces: '\s+' ;
    token number: '\d+' ;
    token add: '[+-]' ;
    token mul: '[*/]' ;

    START -> Term ;

    Term -> Fact ( add Fact )* ;

    Fact -> Atom ( mul Atom )* ;

    Atom -> number | '\(' Term '\)' ;

Calc is the name of the Python class generated by TPG. START is a special non terminal symbol treated as the axiom¹ of the grammar.

With this small grammar we can only recognize a correct expression. We will see in the next sections how to read the actual expression and to compute its value.

3.3 Reading the input and returning values

The input of the grammar is a string. To do something useful we need to read this string in order to transform it into an expected result.

This string can be read by catching the return value of terminal symbols. By default any terminal symbol returns a string containing the current token. So the token '(' always returns the string '('. For some tokens it may be useful to compute a Python object from the token. For example number should return an integer instead of a string, add and mul should return a function corresponding to the operator. That why we will add a function to the token definitions. So we associate int to number and make_op to add and mul.

int is a Python function converting objects to integers and make_op is a user defined function (figure 3.4).

Figure 3.4:

make_op function

def make_op(s):
    return {
        '+': lambda x,y: x+y,
        '-': lambda x,y: x-y,
        '*': lambda x,y: x*y,
        '/': lambda x,y: x/y,
    }[s]

To associate a function to a token it must be added after the token definition as in figure 3.5

Figure 3.5:

Token definitions with functions

    separator spaces: '\s+' ;
    token number: '\d+' int ;
    token add: '[+-]' make_op;
    token mul: '[*/]' make_op;

We have specified the value returned by the token. To read this value after a terminal symbol is recognized we will store it in a Python variable. For example to save a number in a variable n we write number/n. In fact terminal and non terminal symbols can return a value. The syntax is the same for both sort of symbols. In non terminal symbol definitions the return value defined at the left hand side is the expression return by the symbol. The return values defined in the right hand side are just variables to which values are saved. A small example may be easier to understand (figure 3.6).

Figure 3.6:

Return values for (non) terminal symbols


Rule	Comment

X/x ->	Defines a symbol X. When X is called, x is returned.
Y/y	X starts with a Y. The return value of Y is saved in y.
Z/z	The return value of Z is saved in z.
{{ x = y+z }}	Computes x.
;	Returns x.

In the example described in this tutorial the computation of a Term is made by applying the operator to the factors, this value is then returned :

Term/t -> Fact/t ( add/op Fact/f {{ t = op(t,f) }} )* ;

This example shows how to include Python code in a rule. Here {{...}} is copied verbatim in the generated parser.

Figure 3.7:

Expression recognizer and evaluator

parser Calc:

    separator spaces: '\s+' ;

    token number: '\d+' int ;
    token add: '[+-]' make_op ;
    token mul: '[*/]' make_op ;

    START -> Term ;

    Term/t -> Fact/t ( add/op Fact/f {{ t = op(t,f) }} )* ;

    Fact/f -> Atom/f ( mul Atom/a {{ f = op(f,a) }} )* ;

    Atom/a -> number/a | '\(' Term/a '\)' ;

3.4 Embeding the parser in a script

To embed a TPG parser in a Python program, you only need the tpg.compile function. This function takes a grammar (in a string²) and returns a string containing the Python code for the parser. One way to use this parser is to exec its definition. A practical way to build parsers is to exec the result of tpg.compile (figure 3.8).

Figure 3.8:

Python code generation from a grammar

import tpg

exec(tpg.compile(r""" # Your grammar here """))

# You can instanciate your parser here

To use this parser you now just need to import tpg, compile the grammar and instanciate an object of the class Calc as in figure 3.9.

Figure 3.9:

Complete Python script with expression parser

import tpg

def make_op(s):
    return {
        '+': lambda x,y: x+y,
        '-': lambda x,y: x-y,
        '*': lambda x,y: x*y,
        '/': lambda x,y: x/y,
    }[s]

exec(tpg.compile(r"""

parser Calc:

    separator spaces: '\s+' ;

    token number: '\d+' int ;
    token add: '[+-]' make_op ;
    token mul: '[*/]' make_op ;

    START/e -> Term/e ;
    Term/t -> Fact/t ( add/op Fact/f {{ t = op(t,f) }} )* ;
    Fact/f -> Atom/f ( mul/op Atom/a {{ f = op(f,a) }} )* ;
    Atom/a -> number/a | '\(' Term/a '\)' ;

"""))

calc = Calc()
expr = raw_input('Enter an expression: ')
print expr, '=', calc(expr)

3.5 Conclusion

This tutorial shows some of the possibilities of TPG. If you have read it carefully you may be able to start with TPG. The next chapters present TPG more precisely. They contain more examples to illustrate all the features of TPG.

Chapter 3
Tutorial

3.1 Introduction

3.2 Defining the grammar

3.3 Reading the input and returning values

3.4 Embeding the parser in a script

3.5 Conclusion

Chapter 3Tutorial

3.1 Introduction

3.2 Defining the grammar

3.3 Reading the input and returning values

3.4 Embeding the parser in a script

3.5 Conclusion

Chapter 3
Tutorial