Computer Science 3 - 2008 - Test on Prac 22

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood the material in the prac questions. You may not use your textbook, prac solutions or a computer. Answer on the question sheet (4 sides).

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. (Grammars) Formally, a grammar G is defined by a quadruple { N, T, S, P } with the four components

(a) N - a finite set of non-terminal symbols,
(b) T - a finite set of terminal symbols,
(c) S - a special goal or start or distinguished symbol,
(d) P - a finite set of production rules or, simply, productions.

where a production relates to a pair of strings, say a and b, specifying how one may be transformed into the other:

a ® b where a e (N È T )* N (N È T )* , b e (N È T )*

and we can then define the language L(G) produced by the grammar G by the relation

L(G) = { w |  S Þ* w Ù w Î T * }

(a) In terms of this style of notation, define precisely (that is to say, mathematically; we do not want a long essay or English description) what you understand by [2 marks each]

(1) FIRST(s) where s e ( N È T )+

a Î FIRST(s)        if s Þ* at      (a Î T ; s , t Î (N È T )* )

(2) FOLLOW(A) where A e N

a Î FOLLOW(A)       if S Þ* xAaz      (A, S Î N ; a Î T ; x , z Î (N È T )* )

(3) A context-free grammar

a ® b where a e N , b e (N È T )*

(b) In terms of the notation here, concisely state the two rules that must be satisfied by the productions of a context free grammar in order for it to be classified as an LL(1) grammar. [6 marks]

Rule 1

For each non-terminal Ai Î N that admits alternatives

Ai ® xi1 | xi2 | . . . xin

the sets of initial terminal symbols of all strings that can be generated from each of the alternative xik must be disjoint, that is

FIRST(xij) Ç FIRST(xik) = ·     for all j ¹ k

Rule 2

For each non-terminal Ai Î N that admits alternatives

Ai ® xi1 | xi2 | . . . xin

but where xik Þ e for some k, the sets of initial terminal symbols of all sentences that can be generated from each of the xij for j ¹ k must be disjoint from the set FOLLOW(Ai) of symbols that may follow any sequence generated from Ai, that is

FIRST(xij) Ç FOLLOW(Ai) = ·,      j ¹ k

or, rather more loosely,

FIRST(Ai) Ç FOLLOW(Ai) = ·

3. Consider the following attempt to describe the activities at a Highland Gathering (not quite as you had it last week)

    COMPILER Gathering $CN
    /* Describes the Pipe Band events at a Highland Gathering
       P.D. Terry, Rhodes University, 2008 */

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Gathering   = { Competition } .
      Competition = MSR | Medley .
      MSR         = "March" "Strathspey" "Reel" [ "March" ] .
      Medley      = OneTune { OneTune } .
      OneTune     = "March" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" .
    END Gathering.

(a) Rewrite the productions to eliminate the meta brackets. [3 marks]

    PRODUCTIONS
      Gathering       = Competitions .
      Competitions    = Competition Competitions | .
      Competition     = MSR | Medley .
      MSR             = "March" "Strathspey" "Reel" OptionalMarch .
      OptionalMarch   =  "March" |  .
      Medley          = OneTune MoreTunes .
      MoreTunes       = OneTune MoreTunes | .
      OneTune         =   "March" | "Jig" | "Hornpipe" | "Reel"
                        | "Strathspey" MoreStrathspeys "Reel" .
      MoreStrathspeys = "Strathspey" MoreStrathspeys | .
    END Gathering.

(b) Explain whether or not this is an LL(1) grammar. If it is not an LL(1) grammar, suggest a small modification that could make it become one. [4 marks]

It is not an LL(1) Grammar.

   FIRST(MSR)    = { "March" }
   FIRST(Medley) = { "March", "Jig", "Hornpipe" , "Strathspey", "Reel" }

which have "March" in common. In addition the nullable non terminals need further analysis.

   FIRST(Competitions) = { "March" "Strathspey" "Reel" "Jig" "Hornpipe" }
   FOLLOW(Competition) = empty
   No problem

   FIRST(OptionalMarch)  = { "March" }
   FOLLOW(OptionalMarch) = { "March" "Strathspey" "Reel" "Jig" "Hornpipe" }
   Problem - Rule 2

   FIRST(MoreTunes)  = { "March" "Strathspey" "Reel" "Jig" "Hornpipe" }
   FOLLOW(MoreTunes) = { "March" "Strathspey" "Reel" "Jig" "Hornpipe" }
   Problem - Rule 2

   FIRST(MoreStrathspeys)  = { Strathspey" }
   FOLLOW(MoreStrathspeys) = { "Reel" }
   No problem

We could "fix" this by placing a distinctive announcement into each alternative for Competition

Competition = "MSR" MSR | "Medley" Medley .

(c) The grammar can be criticised for not really describing the competition structures. What is lacking, and how could you improve it (give the amended production(s)). [3 marks]

There is no concept of more than one competing band in each event. To incorporate this it might seem that we simply introduce something like

Competition = "MSR" { MSR } | "Medley" { Medley } .

but this introduces LL(1) errors again (basically a listener who is not a piper himself/herself/itself cannot tell where one competitor stops and the next one starts. So we should add (as obviously happens in practice) a "break" after each competitor:

Competition = "MSR" { MSR "break" } | "Medley" { Medley "break" } .

(d) Could the last production have been written as follows (without changing the language)? [2 marks]

      OneTune  = "March" | "Jig" | "Hornpipe" | "Reel" | { "Strathspey" } "Reel" . 

No, don't do that. You get an LL(1) error again. But you could try

OneTune = "March" | "Jig" | "Hornpipe" | { "Strathspey" } "Reel" .

Many people seemed to think that a medley MUST contain at least one strathspey. But that was not a feature of the original grammar. Its's very different saying "you must have a strathspey" or saying "you can have a strathspey if you wish, but if you have a strathspey it must be followed by another strathspey or by a reel".

4. A week or so back you worked with this Cocol grammar:

   COMPILER Calc  $CN
   /* Simple four function calculator
      P.D. Terry, Rhodes University, 2008 */

   CHARACTERS
     digit      = "0123456789" .
     hexdigit   = digit + "ABCDEF" .
   TOKENS
     decNumber  = digit { digit } .
     hexNumber  = "$" hexdigit { hexdigit } .

   IGNORE CHR(0) .. CHR(31)

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  |  "-" Term } .
     Term       = Factor { "*" Factor |  "/" Factor } .
     Factor     = decNumber | hexNumber .
   END Calc.

"Numbers" here could be simple decimal integers like 1234 or hexadecimal numbers like $DEAD.

Suppose we wanted a calculator that could recognize decimal numbers with an optional decimal point, as in

1232 12.34 0.34 .34 12.

and hexadecimal integer numbers using another common notation that insists on a trailing H, as in

03FH 0BACH

How would you modify the ATG file? (Hint: You should know this, but I suppose some of you might not spot it in the heat of the moment. You only have to modify the TOKENS section, not the PRODUCTIONS section, so don't waste time on the PRODUCTIONS section!) [5 marks]

The problem here is to make sure that all the valid combinations are allowed, unambiguously, insisting that at least one digit appear, and with no possibility of obtaining several periods in one number. So attempts like the following are wrong

   TOKENS
     decNumber = digit { digit } [ "." ] { digit }  .         (* ambiguous *)
     decNumber = [ "." ] digit { digit } [ "." { digit } ]  . (* can get 2 "." *)
     decNumber = digit { digit | "." } .                      (* can get 2 "." *)
     decNumber = { digit } [ "." { digit } ] .                (* nullable *)
     decNumber = { digit } [ "." ] { digit } .                (* 3rd time still unlucky - grin! *)

Here is a better one:

     decNumber =   digit { digit } [ "."  { digit } ]
                 | "." digit { digit }  .

For the hexadecimal numbers, care must be taken not to allow the token to be confused with an identifier. Okay, so you might argue that identifiers did not really come into this one, but something like this

     hexNumber = hexDigit { hexDigit } "H" ,

would allow, for example BACH or EACH. Much better definitions would be either of the following. The trailing H would allow the "semantic" part of the system to convert the string using base 16 rather than base 10.

     hexNumber = digit { hexDigit } "H" .
     hexNumber = "0" hexDigit { hexDigit } "H" .

5. Given the following parse tree, construct as much of the grammar as can be determined from that parse tree. Is this the complete grammar? If not, why not? [4+2 marks]

                                             S
                                             |
                        .--------------------+------------------------.
                        |                    |                        |
                        A                    B                        C
                        |                    |                        |
                  .-----------.         .----------.           .------+------.
                  |           |         |          |           |      |      |
                  B           d         e          f           g      C      g
                  |                                                   |
             .---------.                                         .--------.
             |         |                                         |        |
             e         f                                         A        d
                                                                 |
                                                                 |
                                                                 h

The productions that we can deduce from the tree are

S = A B C .
A = B d | h .
B = e f .
C = g C g | A d .

Well, it might be the complete grammar, but it does not have to be. There could have been almost any number of other alternatives for the right sides of these production rules, which could have introduced any number of other non-terminals and terminals. All great fun!

Several students are confused by the notions of a "complete" grammar (meaning that there is nothing missing from it) and a "reduced grammar" which is one in which all nonterminals are reacahable, and where one can derive strings of terminals only from any non-terminal.

6. Given the grammar

               S = a S | S B | d .
               B = B b | c .

the parse tree below shows a possible derivation of the string aadccb. Show that this grammar is ambiguous. [4 marks]

All we have to do is find another parse tree for the same sentence, which is easily done if we start by substitution from the second alternative for S rather than the first in advancing from the initial goal symbol. In this case one can find several such trees, a few of which are shown below.

           S                            S                          S                           S
           |                            |                          |                           |
      .--------.                   .--------.               .--------------.            .--------------.
      |        |                   |        |               |              |            |              |
      a        S                   a        S               S              B            S              B
               |                            |               |              |            |              |
           .--------.                  .---------.      .--------.      .------.    .--------.      .------.
           |        |                  |         |      |        |      |      |    |        |      |      |
           a        S                  S         B      a        S      B      b    S        B      B      b
                    |                  |         |               |      |           |        |      |
               .----------.        .-----.    .-----.         .-----.   |           |        |      |
               |          |        |     |    |     |         |     |   |         .-----.    |      |
               S          B        a     S    B     b         a     S   c         |     |    c      c
               |          |              |    |                     |             a     S
           .------.    .------.    .-----|    |                 .------.                |
           |      |    |      |    |     |    c                 |      |                |
           S      B    B      b    S     B                      S      B           .--------.
           |      |    |           |     |                      |      |           |        S
           |      |    |           |     |                      |      |           |        |
           d      c    c           d     c                      d      c           a        d


Home  © P.D. Terry