The introduction to ANTLRWorks

  1. Run ANTLRWorks,
    1. run: java -jar /opt/antlr3/antlrworks.jar
    2. do not fill or send the questionnaire (the Author does not care about it any more),
    3. choose the type of the file: *.g

      and create it

      Under the Linux & kde we cannot enter the grammar name. But if we click 'OK' button, the program will tell what it thinks about us.
      Now the writing in the 'name' field is possible.

    I advise to NOT give spaces in names (of files or directories) - it leads to troubles in the (near) future.
    Now choose the type of the grammar 'Combined Grammar'. There are also 'Lexer', 'Parser' & 'Tree' grammars but today we need the first one.

    Additionally we can choose some lexems to create. Today the ones underscored on the above figure will be needed. After the 'OK' button we'll got the text of the lexical analyser with the words (lexemes) chosen. Now the saving of the program is very recommended.

    It will be translated to the  java code, so the name of the grammar file must be the same as the grammar name.

    Above we have three panels. The first page has name 'Syntax Diagram'. This page shows the diagram of the chosen rule. The diagram may have two forms according to the 'NFA' radio-button. Try it! The 'Rule Name' switch does not need to be explained.

    The blue marked elements in the text (in curly brackets {}) are called 'actions'. Here, in lexical rules they are called lexical actions. Actions in syntactic rules are called semantic because they plays role of the semantic analyser. The action $channel = HIDDEN causes that the just recognized lexeme goes to the hidden channel. Hidden from the parser. It means that such lexemes are ignored but also that they must not be used in syntactical rules because the parser will never see them (it IS important!). The second action: greedy = false sets an option of the analyser. Put here applies only to the nearest asterisk. False means that the pattern should be matched ungreedy. What does it means? The dot replaces any character including the '*' and '/'. Greedy matching would take all characters  until the very last '*/' in the text. Ungreedy matching will end on the first '*/' sequence. It is exactly what we want here.
  2. Now let us define some syntactic rules.
    Usually we put grammar (syntactical) rules before the lexical ones. Although it is not obligatory, it is practical. Let us start recognizing one number (the integer)
    atom : INT ;
    The rule should be read "the atom looks like exactly one INT".
  3. Let us try this trivial grammar using the built in Interpretter. Set atom as starting rule.
    We have put two numbers as an input but our grammar recognized only the first of them.
  4. Why? Make an experiment adding * just after the INT and compare the result. Can we write more than two numbers now and are they recognized? Well, the asterisk means that we accept zero or more INTs as the atom. Zero! Compare the behaviour of the parser with and without the asterisk when the input is empty. If you do not want the empty input to be OK - put + instead of *. It means "one or more".

    Try this rule:
    	:	INT
    	| LP atom RP
    This recognizes one integer, possibly put inside a pair of parenthesis, or even some pairs. Some of lexemes in the rule are marked red what seems as errors. It is true, these lexemes are undefined. In figures below you have PLUS undefined but the same states for LP and RP (left and right parenthesis). Using this as an example try to define them and try the parser.

    The ANTLRWorks helps us in fixing such errors. Of course it will not imagine what WE think saying 'PLUS' or 'LP' but at least it makes the lexeme.

    We should tell how it looks like:
    	: '('

    In the same way we can define other lexemes needed. 'EOF' should not be defined.
    plik : (expr NL+)* EOF
      : term
    (PLUS term | MINUS term )* ; term :atom (MUL atom | DIV atom )* ; atom : INT | LP expr RP ;
    PLUS: '+'
    MINUS: '-' ;
    MUL: '*'|'.' //asterisk or dot as "times" sign ;
    DIV: '/'|':' ;
    NL: '\n' ; LP: '(' ; RP: ')' ;

    Now we have two different lexemes with the same definition (the fragment in fact):

    My proposition is to waste '\n' from the 'WS' definition.
  5. Now we have the grammar of the four operation calculator (with parenthesis).

    Let us check how the parser works.
    Get the 'Interpreter' page (bottom). Choose the appropriate starting rule. In our example it is 'plik' (Polish 'file').

    After putting the test string and clicking on the arrow we will obtain the result in the form of the parse tree.

    We can write another example and obtain another parse tree:

    Oops? Do not care - it is well known bug in the interpreter. Just put second pair of parenthesis in every rule where you have the alternative followed by the asterisk or the plus sign.
    : term ((PLUS term
    | MINUS term))* NL

  6. The Console page
    It is important page because error messages appears here. These messages are never automatically cleared, you should do it before each building or run in order to not fight against old errors.
  7. Debugger
    To run it click on the green ugly bug on the tool bar or choose the 'Debug' from the menu 'Run', or pressing the Cntl-D keys.

    Program will be compiled but errors may appear. The following message box is shown in the case:

    Errors should be corrected and the next compilation taken and so on. When it succeed the following window will appear:

    We can write the test string or the name of the test file in it. Choose the correct starting rule! Remember that in our example the rule is plik. After we click 'OK' the debug page will be shown. On this page we can perform the program step-by-step or continuously. During the debugging or after it we can choose the element of the parse tree or in the input stream. The corresponding elements of the parse tree, the input stream and the grammar will be marked. I am speaking about elements marked blue on the following figure. Yes, dear Girls, I am speaking about the color which you would call indigo, azure, cyan, turquoise-colored or something else but not blue. Yes, dear Geek, I am speaking about the #99ccfe color, which has not PANTONE code.

    Red cursor shows the current position in the program.

to be continued