Computer Science 301 - 2010

Tutorial for week 21 - Solutions - simple grammars

(a) One simple grammar for describing expressions was discussed in class:

      Goal       = Expression .
      Expression = Term | Expression "-" Term .
      Term       = Factor | Term "*" Factor .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b - c"

                            Goal
                              |
                          Expression
                              |
                  -------------------
                  |           |     |
             Expression       |   Term
                  |           |     |
          ----------------    |     |
          |       |      |    |     |
       Expression |    Term   |   Factor
          |       |      |    |     |
        Term      |    Factor |     |
          |       |      |    |     |
       Factor     |      |    |     |
          |       |      |    |     |
          a       -      b    -     c


(b) Another grammar is

      Goal       = Expression .
      Expression = Term | Term "-" Expression .
      Term       = Factor | Factor "*" Term .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b - c" and discuss the differences between this grammar and the last.

                            Goal
                              |
                          Expression
                              |
                  ------------------------------
                  |           |                |
                Term          |           Expression
                  |           |                |
                  |           |        ----------------
                  |           |        |       |      |
                Factor        |     Expression |    Term
                  |           |        |       |      |
                  |           |      Term      |    Factor
                  |           |        |       |      |
                  |           |     Factor     |      |
                  |           |        |       |      |
                  a           -        b       -      c

The two grammars are "equivalent" in the sense that any string that can be derived from the goal of one can be derived from the goal of the other. If and when we start applying "meaning" by applying some tree walking algorithm from the nodes to the root to get the meaning of a complete expression we would find that there are differences. The left-recursive grammar (the first one) provides "left to right" associativity - so that, for example "a-b-c" "means" (a-b)-c, whereas the right recursive grammar provides "right to left" associativity - so that "a-b-c" would "mean" a-(b-c), which would be at variance with the usual mathematical convention.


(c) Yet another grammar for describing expressions might be:

      Goal       = Expression .
      Expression = Term | Expression "*" Term .
      Term       = Factor | Term "-" Factor .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b * c"

                            Goal
                              |
                          Expression
                              |
                  -------------------
                  |           |     |
             Expression       |   Term
                  |           |     |
                 Term         |     |
          ----------------    |     |
          |       |      |    |     |
        Term      |    Factor |  Factor
          |       |      |    |     |
       Factor     |      |    |     |
          |       |      |    |     |
          a       -      b    *     c

Once again we have an equivalent grammar, in the sense that all three grammars can derive the same set of strings. But in this case it should be readily seen that the "precedence" of the - operator is now greater than the precedence of the * operator, which, again, would conflict with mathematical convention.

Even simpler equivalent grammars - but with all operators of equal precedence - are

      Goal       = Expression .
      Expression = Expression "*" Term | Expression "-" Term | Term
      Term       = "a" | "b" | "c" | "d" .

or even

      Goal       = Expression .
      Expression = Term "*" Expression | Term "-" Expression | Term
      Term       = "a" | "b" | "c" | "d" .

(you might like to wonder whether these have any semantic differences in the form of the trees they produce).


(d) How does one find a grammar that will allow all four operators (+ - * and /) with the familiar precedence and associativity conventions, and will also allow for parentheses within expressions - so that one can derive strings like (a - b) * (c + d)?

Guided by observations on the above grammars that (a) left recursion seems to handle left-to-right associativity and (b) that lower precedence operators seem to have to come "higher" in the scheme of productions, it should not take much to appreciate that the solution will be:

      Goal       = Expression .
      Expression = Term | Expression "+" Term | Expression "-" Term.
      Term       = Factor | Term "*" Factor | Term "/" Factor .
      Factor     = "(" Expression ")" | "a" | "b" | "c" | "d" .


(e) Here is the sort of solution one might suggest for a grammar to describe the contents pages of a book. We can recognise phrase structure concepts like Title, Author, Chapter heading, Subsection headings and so on, so introduce non-terminals for these into the PRODUCTIONS section. We can also recognise lexical concepts like word and number - in fact, two kinds of numbers - so we introduce these into the TOKENS section based on pretty obvious CHARACTER sets. Finally, there are some candidates for "key words" - terminals like "Chapter" and "by". Of course, these last words might appear in the "wrong" places quite legitimately.

3.1 Revisit the key concepts in Chapter 2.4 one by one

but those are the sort of complications that are not in the "spirit" of this silly example, and one does not expect beginners to worry too much about them - please accept that some of these exercises are a bit artificial!

    COMPILER Contents1 $NC
    /* Describe the contents pages of a book
       P.D. Terry, Rhodes University, 2010 */

    CHARACTERS
      uLetter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      lLetter = "abcdefghijklmnopqrstuvwxyz" .
      letter  = uLetter + lLetter .
      digit   = "0123456780" .

    TOKENS
      word    = letter { letter } .  /* any letter will do, but there must be at least one */
      number  = digit { digit } .    /* for numbering chapters */
      section = digit { digit } "." digit { digit } . /* for subsections */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Contents1  = Title "by" Author { Chapter } EOF .
      Title      = word { word } "." .
      Author     = word { word } "." .
      Chapter    = "Chapter" number Title { Subsection } .
      Subsection = section word { word } . /* notice no final "." */
    END Contents1.

There are, obviously, other ways of doing this. For example, since the combination word { word } comes up so often, one might use

    PRODUCTIONS
      Contents2  = Sentence "by" Sentence { Chapter } EOF .
      Sentence   = Words "." .
      Words      = word { word } .
      Chapter    = "Chapter" number Sentence { Subsection } .
      Subsection = section Words . /* notice no final "." */
    END Contents2.

or

    PRODUCTIONS
      Contents3  = Words "." "by" Words "." { Chapter } EOF .
      Chapter    = "Chapter" number Words "." { section Words } .
      Words      = word { word } .
    END Contents3.

One might want to insist that the first word in a title or section heading had to start with an uppercase letter. This gets a bit messier, but something like this might come to mind:

    COMPILER Contents4 $NC
    /* Describe the contents pages of a book
       P.D. Terry, Rhodes University, 2010 */

    CHARACTERS
      uLetter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      lLetter = "abcdefghijklmnopqrstuvwxyz" .
      letter  = uLetter + lLetter .
      digit   = "0123456780" .

    TOKENS
      uWord   = uLetter { letter } .  /* must start with an upper case letter */
      lWord   = lLetter { letter } .  /* must start with a lower case letter */
      number  = digit { digit } .     /* for numbering chapters */
      section = digit { digit } "." digit { digit } . /* for subsections */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Contents4  = Sentence "by" Sentence { Chapter } EOF .
      Sentence   = Words "." .
      Words      = uWord { uWord | lWord } .
      Chapter    = "Chapter" number Sentence { Subsection } .
      Subsection = section Words . /* notice no final "." */
    END Contents4.

Two frequent mistakes that beginners make at this stage of their development are (a) to use non-terminals where tokens would be better:

          Subsection = number "." number Words .  /* don't do this  - 3.4 is one token! */

and (b) to add punctuation like full stops into the token definition for words or numbers in the wrong way

    TOKENS   /* if necessary or useful */
      word    = letter { letter | "."  } .  /* would allow hello.word as one word */
      number  = digit { digit | "." } .     /* would allow 34...56 as one number */