Tutorial 28 solution

Computer Science 301 - 2005

Extra tutorials for Swot Week (2) - Adding actions to grammars

The following Cocol grammar may be familiar. It describes a set of EBNF productions that can incorporate Wirth's metabrackets { } [ and ].

    COMPILER EBNF $CN
    /* Parse a set of EBNF productions
       P.D. Terry, Rhodes University, 2005 */

    CHARACTERS
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noquote1 = ANY - "'" .
      noquote2 = ANY - '"' .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

    COMMENTS FROM "<" TO ">"

    IGNORE  CHR(9) .. CHR(13)

    PRODUCTIONS
       EBNF       = { Production } EOF .
       Production = nonterminal "=" Expression "." .
       Expression = Term { "|" Term } .
       Term       = Factor { Factor } .
       Factor     =   nonterminal
                    | terminal
                    | "[" Expression "]"
                    | "(" Expression ")"
                    | "{" Expression "}" .
    END EBNF.

Show how the productions would be attributed so as to parse a set of productions given in the Wirth notation and reproduce them one to a line in the alternative EBNF notation that uses Kleene closure symbol * and e. For example, a production like

Program = [ Header ] { Statement } .

should be transformed to

Program = ( Header | e ) ( Statement )* .

You can find a Java "kit" for the following solutions on the course web page as the file tut28.zip.

Conversion of the productions is very easy; one simply writes the tokens as they are parsed, interspersed with blank characters, and substituting one form of bracketing for another. A simple solution simply ignores comments:

    import Library.*;

    COMPILER EBNF1 $CN
    /* Parse a set of EBNF productions and convert to one form of EBNF

       - This version simply ignores comments

       P.D. Terry, Rhodes University, 2005 */

    CHARACTERS
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noquote1 = ANY - "'" .
      noquote2 = ANY - '"' .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

    COMMENTS FROM "<" TO ">"

    IGNORE  CHR(9) .. CHR(13)

    PRODUCTIONS
       EBNF1
       = { Production } EOF .

       Production
       = nonterminal           (. IO.write(token.val + " "); .)
       "="                     (. IO.write("= "); .)

       Expression
       "."                     (. IO.writeLine(". "); .)
       .

       Expression
       = Term
       { "|"                   (. IO.write("| "); .)
       Term } .

       Term
       = Factor { Factor } .

       Factor
       =   nonterminal         (. IO.write(token.val + " "); .)
         | terminal            (. IO.write(token.val + " "); .)
         | "["                 (. IO.write("( "); .)
           Expression "]"      (. IO.write("| eps ) "); .)
         | "("                 (. IO.write("( "); .)
           Expression
           ")"                 (. IO.write(") "); .)
         | "{"                 (. IO.write("( "); .)
           Expression
           "}"                 (. IO.write(")* "); .)
         .
    END EBNF1.

To be able to retain the < comments in a set of productions > one has to resort to the idea of declaring a suitable token pattern for them as a pragma (see page 250). A similar trick can be used to retain line breaks.

At first this seems very simple, leading to the following code. Note that the text of a pragma is stored in the "look ahead token" (la.val), not the most recent token (token.val).

    import Library.*;

    COMPILER EBNF2 $CN
    /* Parse a set of EBNF productions and convert to another form of EBNF
       using the Kleene star and explicit eps

       - This version handles comments and line ends, but writes them in the wrong places!

       P.D. Terry, Rhodes University, 2005 */

    CHARACTERS
      letter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline   = "_" .
      digit     = "0123456789" .
      noquote1  = ANY - "'" .
      noquote2  = ANY - '"' .
      inComment = ANY - ">" .
      eol       = CHR(10) .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

    PRAGMAS
      Comments = "<" { inComment } ">" .  (. IO.write(" " + la.val + " "); .)
      EOL      = eol .                    (. IO.writeLine(); .)

    IGNORE  CHR(0) .. CHR(31) - CHR(10)

    PRODUCTIONS
       EBNF2
       = { Production } EOF .

       Production
       = nonterminal           (. IO.write(token.val, -14); .)
       "="                     (. IO.write(" = "); .)

       Expression
       "."                     (. IO.writeLine("."); .)
       .

       Expression
       = Term
       { "|"                   (. IO.write("| "); .)
       Term } .

       Term
       = Factor { Factor } .

       Factor
       =   nonterminal         (. IO.write(token.val + " "); .)
         | terminal            (. IO.write(token.val + " "); .)
         | "["                 (. IO.write("( "); .)
           Expression "]"      (. IO.write("| eps ) "); .)
         | "("                 (. IO.write("( "); .)
           Expression
           ")"                 (. IO.write(") "); .)
         | "{"                 (. IO.write("( "); .)
           Expression
           "}"                 (. IO.write(")* "); .)
         .
    END EBNF2.

However, the "look ahead" property requires a more subtle trick - we need to delay writing the pragma until after the next token has been parsed. The trick is to accumulate the pragma text into a string and redirect all output operations through a simple method that will output the token followed by the pragma!

    import Library.*;

    COMPILER EBNF3 $CN
    /* Parse a set of EBNF productions and convert to another form of EBNF
       using the Kleene star and explicit eps

       - This version handles comments and line ends, adds them to a pragma string
         so that they can be written in the correct places

       P.D. Terry, Rhodes University, 2005 */

    static String pragmaStr = "";  /* starts empty */

    static void write(String str) {
      IO.write(str + pragmaStr);
      pragmaStr = "";  /* reset the pragma string */
    }

    CHARACTERS
      letter    = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline   = "_" .
      digit     = "0123456789" .
      noquote1  = ANY - "'" .
      noquote2  = ANY - '"' .
      inComment = ANY - ">" .
      eol       = CHR(10) .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

    PRAGMAS
      Comments = "<" { inComment } ">" .  (. pragmaStr += " " + la.val + " "; .)
      EOL      = eol .                    (. pragmaStr += "\r\n"; .)

    IGNORE  CHR(0) .. CHR(31) - CHR(10)

    PRODUCTIONS
       EBNF3
       = { Production } EOF .

       Production
       = nonterminal           (. write(token.val); .)
       "="                     (. write(" = "); .)

       Expression
       "."                     (. write(".\r\n"); .)
       .

       Expression
       = Term
       { "|"                   (. write("| "); .)
       Term } .

       Term
       = Factor { Factor } .

       Factor
       =   nonterminal         (. write(token.val + " "); .)
         | terminal            (. write(token.val + " "); .)
         | "["                 (. write("( "); .)
           Expression "]"      (. write("| eps ) "); .)
         | "("                 (. write("( "); .)
           Expression
           ")"                 (. write(") "); .)
         | "{"                 (. write("( "); .)
           Expression
           "}"                 (. write(")* "); .)
         .

    END EBNF3.