Computer Science 301 - 2007

Tutorial for week 21 - Solutions - simple grammars

One simple grammar for describing expressions was discussed in class:

      Goal       = Expression .
      Expression = Term | Expression "-" Term .
      Term       = Factor | Term "*" Factor .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b - c"

                            Goal
                              |
                          Expression
                              |
                  -------------------
                  |           |     |
             Expression       |   Term
                  |           |     |
          ----------------    |     |
          |       |      |    |     |
       Expression |    Term   |   Factor
          |       |      |    |     |
        Term      |    Factor |     |
          |       |      |    |     |
       Factor     |      |    |     |
          |       |      |    |     |
          a       -      b    -     c


Another grammar is

      Goal       = Expression .
      Expression = Term | Term "-" Expression .
      Term       = Factor | Factor "*" Term .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b - c" and discuss the differences between this grammar and the last.

                            Goal
                              |
                          Expression
                              |
                  ------------------------------
                  |           |                |
                Term          |           Expression
                  |           |                |
                  |           |        ----------------
                  |           |        |       |      |
                Factor        |     Expression |    Term
                  |           |        |       |      |
                  |           |      Term      |    Factor
                  |           |        |       |      |
                  |           |     Factor     |      |
                  |           |        |       |      |
                  a           -        b       -      c

The two grammars are "equivalent" in the sense that any string that can be derived from the goal of one can be derived from the goal of the other. If and when we start applying "meaning" by applying some tree walking algorithm from the nodes to the root to get the meaning of a complete expression we would find that there are differences. The left-recursive grammar (the first one) provides "left to right" associativity - so that, for example "a-b-c" "means" (a-b)-c, whereas the right recursive grammar provides "right to left" associativity - so that "a-b-c" would "mean" a-(b-c), which would be at variance with the usual mathematical convention.


Yet another grammar for describing expressions might be:

      Goal       = Expression .
      Expression = Term | Expression "*" Term .
      Term       = Factor | Term "-" Factor .
      Factor     = "a" | "b" | "c" | "d" .

Show the parse tree corresponding to the string "a - b * c"

                            Goal
                              |
                          Expression
                              |
                  -------------------
                  |           |     |
             Expression       |   Term
                  |           |     |
                 Term         |     |
          ----------------    |     |
          |       |      |    |     |
        Term      |    Factor |  Factor
          |       |      |    |     |
       Factor     |      |    |     |
          |       |      |    |     |
          a       -      b    *     c

Once again we have an equivalent grammar, in the sense that all three grammars can derive the same set of strings. But in this case it should be readily seen that the "precedence" of the - operator is now greater than the precedence of the * operator, which, again, would conflict with mathematical convention.

Even simpler equivalent grammars - but with all operators of equal precedence - are

      Goal       = Expression .
      Expression = Expression "*" Term | Expression "-" Term | Term
      Term       = "a" | "b" | "c" | "d" .

or even

      Goal       = Expression .
      Expression = Term "*" Expression | Term "-" Expression | Term
      Term       = "a" | "b" | "c" | "d" .

(you might like to wonder whether these have any semantic differences in the form of the trees they produce).


How does one find a grammar that will allow all four operators (+ - * and /) with the familiar precedence and associativity conventions, and will also allow for parentheses within expressions - so that one can derive strings like (a - b) * (c + d)?

Guided by observations on the above grammars that (a) left recursion seems to handle left-to-right associativity and (b) that lower precedence operators seem to have to come "higher" in the scheme of productions, it should not take much to appreciate that the solution will be:

      Goal       = Expression .
      Expression = Term | Expression "+" Term | Expression "-" Term.
      Term       = Factor | Term "*" Factor | Term "/" Factor .
      Factor     = "(" Expression ")" | "a" | "b" | "c" | "d" .


If we wish to allow for leading (unary) operators, as in expressions like "- a + ( -b + ( + c / d ))" there are two approaches that are often followed.

      Goal       = Expression .
      Expression = Term | Expression "+" Term | Expression "-" Term.
      Term       = Factor | Term "*" Factor | Term "/" Factor .
      Factor     = [ "+" | "-" ] Factor  | Primary .
      Primary    = "(" Expression ")" | "a" | "b" | "c" | "d" .
or
      Goal       = Expression .
      Expression = [ "+" | "-" ] ( Term | Expression "+" Term | Expression "-" Term ).
      Term       = Factor | Term "*" Factor | Term "/" Factor .
      Factor     = "(" Expression ")" | "a" | "b" | "c" | "d" .
(You might like to puzzle out whether these are really equivalent, and if not, why not.)


Develop a Cocol grammar to describe a list of names of staff who have titles, first names and/or initials, surnames, and (usually) qualifications, for example

    Professor Peter Clayton, PhD .
    Professor P. D. Terry, MSc, PhD .
    George Clifford Wells, BSc(Hons), MSc, PhD .
    Greg G. Foster, PhD .
    Dr Karen Lee Bradshaw, PhD .
    Mrs Madeleine Wright, MSc .
    B. Cleo C. Biko, BCom, BEd .
    Dr Dameon Wagner, PhD .
    Ms Cheryl Fischer .
This could be done in several ways. the important thing is to try to introduce non-terminals for the items of semantic interest:
  COMPILER Staff

  CHARACTERS
    uletter  = "ABCDEFGHIJKLMNOPQRSTUVWZYZ" .
    lletter  = "abcdefghijklmnopqrstuvwzyz" .

  TOKENS
    Name     = uletter { uletter | lletter | "'" uletter | "-" uletter } .
    Initial  = uletter "." .

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Staff    = { Person } .
    Person   = [ Title ] { Name | Initial } Surname { "," Degree } "." .
    Title    = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" .
    Surname  = Name .
    Degree   = "BA" | "BSc" | "BCom" | "BSc(Hons)" | "BCom(Hons)" | "MA" | "MSc" | "BEd" .
               /* and others like this */
  END Staff.
A point of interest is that this correctly describes a staff list, but is actually difficult to use, as surnames and names are lexically indistinguishable. This is an example of a so-called "LL(1)" problem, a topic which we shall discuss later in the course.