Tutorial 5 - Solutions

Computer Science 301 - Translators

Tutorial 5 - Practice with Coco and grammars - Solutions

Describing sets of EBNF productions in other ways

It is important to get some idea of how and when to use recursive grammars and how and when to use Wirth/Coco style grammars and, given one, to be able to convert it to the other form.

Here is a Cocol grammar that describes EBNF using EBNF conventions, based on that on page 104 of the textbook:

    COMPILER EBNF $CN
    /* Parse a set of EBNF productions
       P.D. Terry, Rhodes University, 2015 */

    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noQuote1 = ANY - "'" - control .
      noQuote2 = ANY - '"' - control .

    TOKENS
      nonterminal = letter { letter | lowline | digit } .
      terminal    = "'" noQuote1 { noQuote1 } "'" | '"' noQuote2 { noQuote2 } '"' .

    COMMENTS FROM "(*" TO "*)"  NESTED

    IGNORE control

    PRODUCTIONS
      EBNF       = { Production } EOF .
      Production = nonterminal "=" Expression SYNC "." .
      Expression = Term { "|" Term } .
      Term       = Factor { Factor } .
      Factor     =   nonterminal
                   | terminal
                   | "[" Expression "]"
                   | "(" Expression ")"
                   | "{" Expression "}" .
    END EBNF.

Use this grammar as a guide to develop a new system that will recognize or reject a set of productions like those below, but with your Cocol grammar written in such a way that it does not itself use any of the Wirth "meta brackets" { } and [ ] (although, as here, these may appear in the productions that your parser is called on to analyse). As a hint, you will have to set up an equivalent grammar, using right-recursive production rules.

    Goal  = { One } .
    One   = Two "plus" Four "." .
    Four  = Five {"is" Five | ( Six Seven ) } .
    Five  = Six [ Six ] .
    Six   = Two | Three | "(*" Four "*)" | '(' Four ')' .

As hinted in the problem description, the trick here is to replace the productions that use meta-braces for repetition and meta-brackets for optional selection by ones that use right recursion (to avoid LL(1) violations).

Here is one possibility:

    COMPILER EBNF1 $CN
    /* Recognize a set of EBNF productions
       (Does not permit empty terms)
       P.D. Terry, Rhodes University, 2015 */

    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      lowline  = "_" .
      digit    = "0123456789" .
      noQuote1 = ANY - "'" - control .
      noQuote2 = ANY - '"' - control .

    TOKENS
      nonTerminal = letter { letter | lowline | digit } .
      terminal    = "'" noQuote1 { noQuote1 } "'" | '"' noQuote2 { noQuote2 } '"' .

    COMMENTS FROM "(*" TO "*)"  NESTED

    IGNORE control

    PRODUCTIONS
      EBNF1       = Productions EOF .
      Productions = Production Productions | .
      Production  = nonTerminal "=" Expression "." .
      Expression  = Term MoreTerms .
      MoreTerms   = "|" Term MoreTerms | .
      Term        = Factor MoreFactors .
      MoreFactors = Factor MoreFactors | .
      Factor      =   nonTerminal | terminal
                    | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" .
    END EBNF1.

This can be further rearranged to give an even more compact grammar (you might like to ponder whether it shows any differences from the original one so far as associativity and precedence are concerned).

    PRODUCTIONS
      EBNF2       = Productions EOF .
      Productions = Production Productions | .
      Production  = nonTerminal "=" Expression "." .
      Expression  = Term MoreTerms .
      MoreTerms   = "|" Expression | .
      Term        = Factor MoreFactors .
      MoreFactors = Term | .
      Factor      =   nonTerminal | terminal
                    | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" .
    END EBNF2.

The above grammars match the ones given in the problem description. They do not, however, have the property of being able to describe themselves any longer - we might argue that we need to be able to describe a nullable factor (the original could not do this). This might be achieved as follows:

    PRODUCTIONS
      EBNF3       = Productions EOF .
      Productions = Production Productions | .
      Production  = nonTerminal "=" Expression "." .
      Expression  = Term MoreTerms .
      MoreTerms   = "|" Expression | .
      Term        = Factor Term | .
      Factor      =   nonTerminal | terminal
                    | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" .
    END EBNF3.

but notice that this also allows you to accept production rules like

A = b | c | | | .

which you might argue is a bit silly. Further reflection on this is left as a useful exercise!

Phools rush in (where angels fear to tread, apparently)

(This is based on a problem set at the 1994 ACM International Collegiate Programming Contest.)

In the land of Phools the official language is Phoolish. A Phoolish professor noticed that many of his students had not mastered the syntax of Phoolish well. Tired of correcting their many syntactical mistakes, he decided to challenge the students and asked them to write a program that could check the syntactical correctness of any sentence they wrote.

In line with the nature of Phools, the syntax of Phoolish is also pleasantly simple. Here are the rules:

0. The characters in the language are the 26 letters of the standard western alphabet and the language distinguishes between lower and upper case letters.

1. Every lower case letter is a simple correct sentence.

2. If σ is a correct sentence, then so is a sentence consisting of an upper case vowel followed by σ, that is Aσ Eσ Iσ Oσ or Us

3. If σ and τ are correct sentences, then so is a sentence consisting of an upper case consonant followed by στ, for example Bστ Cστ Dστ ... Zστ.

4. Rules 0 to 3 are the only rules that determine the syntactical correctness of a sentence.

5. As a special case, the character "." is used to terminate a sequence of sentences.

Develop a Cocol grammar that describes a sequence of Phoolish sentences, and test it out on some snippets of Phoolish language.

The answer is, perhaps, almost deceptively simple!

    COMPILER Phoolish $CN
    /* Grammar for a very Phoolish language
       P.D. Terry, Rhodes University, 2015 */

    CHARACTERS
      letter    = "abcdefghijklmnopqrstuvwxyz" .
      consonant = "BCDFGHJKLMNPQRSTVWXYZ" .
      vowel     = "AEIOU" .

    TOKENS
      lLetter    = letter .
      uConsonant = consonant .
      uVowel     = vowel .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Phoolish = { Sentence } "." .
      Sentence = lLetter | uVowel Sentence | uConsonant Sentence Sentence .
    END Phoolish.

The exercise is really one of understanding how to convert a verbal description into terse code, and experience has shown that it is easy to misinterpret the question. A complete sentence can follow an uppercase vowel, and two sentences can follow an uppercase consonant.

Thus, for example, while a Ab Cde are valid simpler sentences, so too are U Ab (Vowel Sentence) and W UAb Cde (Consonant Sentence Sentence). All very phoolish, you will agree!

Notice that the names of character sets cannot appear in the PRODUCTIONS section, seductive though this may appear. So the following is incorrect:

    COMPILER Phoolish $CN
    /* INCORRECT Grammar for a very Phoolish language
       P.D. Terry, Rhodes University, 2015 */

    CHARACTERS
      letter    = "abcdefghijklmnopqrstuvwxyz" .
      consonant = "BCDFGHJKLMNPQRSTVWXYZ" .
      vowel     = "AEIOU" .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS  /* INCORRECT !!!!!!!! */
      Phoolish = { Sentence } "." .
      Sentence = letter | vowel Sentence | consonant Sentence Sentence .
    END Phoolish.

However, one could use the following, where the letters are inserted into the production rules, bypassing the need for the CHARACTERS and TOKENS sections! But this is, perhaps, not the best way to do it.

    COMPILER Phoolish $CN
    /* P.D. Terry, Rhodes University, 2015 */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS  /* INCORRECT !!!!!!!! */
      Phoolish = { Sentence } "." .
      Sentence = Letter | Vowel Sentence | Consonant Sentence Sentence .

      Letter    =   "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m"
                  | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" .
      Consonant =   "B" | "C" | "D" | "F" | "G" | "H" | "J" | "K" | "L" | "M" | "N"
                  | "P" | "Q" | "R" | "S" | "T" | "V" | "W" | "X" | "Y" | "Z" .
      Vowel     =   "A" | "E" | "I" | "O" | "U" .
    END Phoolish.