Tutorial 5

Computer Science 301 - Translators

Tutorial 5 - Practice with Coco and grammars

1 Describing sets of EBNF productions in other ways

It is important to get some idea of how and when to use recursive grammars and how and when to use Wirth/Coco style grammars and, given one, to be able to convert it to the other form.

Here is a Cocol grammar that describes EBNF using EBNF conventions, based on that on page 104 of the textbook:

    COMPILER EBNF $CN
    /* Parse a set of EBNF productions
       P.D. Terry, Rhodes University, 2015 */

    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noQuote1 = ANY - "'" - control .
      noQuote2 = ANY - '"' - control .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noQuote1 { noQuote1 } "'" | '"' noQuote2 { noQuote2 } '"' .

    COMMENTS FROM "(*" TO "*)"  NESTED

    IGNORE control

    PRODUCTIONS
      EBNF       = { Production } EOF .
      Production = nonterminal "=" Expression SYNC "." .
      Expression = Term { "|" Term } .
      Term       = Factor { Factor } .
      Factor     =   nonterminal
                   | terminal
                   | "[" Expression "]"
                   | "(" Expression ")"
                   | "{" Expression "}" .
    END EBNF.

Use this grammar as a guide to develop a new system that will recognize or reject a set of productions like those below, but with your Cocol grammar written in such a way that it does not itself use any of the Wirth "meta brackets" { } and [ ] (although, as here, these may appear in the productions that your parser is called on to analyse). As a hint, you will have to set up an equivalent grammar, using right-recursive production rules.

    Goal  = { One } .
    One   = Two "plus" Four "." .
    Four  = Five {"is" Five | ( Six Seven ) } .
    Five  = Six [ Six ] .
    Six   = Two | Three | "(*" Four "*)" | '(' Four ')' .

2 Phools rush in (where angels fear to tread, apparently)

(This is based on a problem set at the 1994 ACM International Collegiate Programming Contest.)

In the land of Phools the official language is Phoolish. A Phoolish professor noticed that many of his students had not mastered the syntax of Phoolish well. Tired of correcting their many syntactical mistakes, he decided to challenge the students and asked them to write a program that could check the syntactical correctness of any sentence they wrote.

In line with the nature of Phools, the syntax of Phoolish is also pleasantly simple. Here are the rules:

0. The characters in the language are the 26 letters of the standard western alphabet and the language distinguishes between lower and upper case letters.

1. Every lower case letter is a simple correct sentence.

2. If σ is a correct sentence, then so is a sentence consisting of an upper case vowel followed by σ, that is Aσ Eσ Iσ Oσ or Us

3. If σ and τ are correct sentences, then so is a sentence consisting of an upper case consonant followed by στ, for example Bστ Cστ Dστ ... Zστ.

4. Rules 0 to 3 are the only rules that determine the syntactical correctness of a sentence.

5. As a special case, the character "." is used to terminate a sequence of sentences.

Develop a Cocol grammar that describes a sequence of Phoolish sentences, and test it out on some snippets of Phoolish language.