Computer Science 3 - 2014 - Test on Week 4 - Solutions

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood LL(1) grammars and the like. YOU MAY NOT USE YOUR TEXTBOOK, NOTES or COMPUTER. Answer on the questions sheet (4 sides).

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. The following grammar attempts to describe expressions incorporating the familiar operators with their correct precedence and associativity.

   COMPILER Expression $CNF
   IGNORE CHR(0) .. CHR(31)
   PRODUCTIONS
     Expression = Term    { ( "+" | "-" ) Term  } .
     Term       = Factor  { ( "*" | "/" ) Factor } .
     Factor     = Primary [ "^" Expression ] .
     Primary    = "a" | "b" | "c" .
  END Expression.

Is this an ambiguous grammar? (Hint: try to find an expression that can be parsed in more than one way).

Many people "guessed" the right answer. To justify the claim that it is ambiguous it would be as well to show a pair of parse trees, not just make a wild claim! Considering the expression a^b*c. This can indeed be parsed in two ways, one with the implicit meaning of a^(b*c) and the other with the meaning of (a^b)*c. The parse trees would look like this (a few intermediate nodes have been omitted to save space)

           Expression                                        Expression
               |                                                 |
             Term                                              Term
               |                                                 |
               |                                       .---------+--------.
               |                                       |         |        |
            Factor                                  Factor      "*"     Factor
               |                                       |                  |
        .------+---------.                             |---------.        |
        |      |         |                             |  |      |        |
     Primary  "^"    Expression                   Primary ^  Expression   |
        |                |                             |         |        |
        |               Term                           |       Term       |
        |         .------+-----.                       |         |        |
        |         |      |     |                       |         |        |
        |       Factor  "*"  Factor                    |       Factor     |
        |         |            |                       |         |        |
        |         |            |                       |         |        |
        a         b            c                       a         b        c

Is it an LL(1) grammar? If not, why not, and can you find a suitable grammar that is LL(1)?

It cannot be an LL(1) grammar if it is ambiguous, but let us see which rules are broken. If we rewrite the first grammar to eliminate the metabrackets we get

     Expression = Term TailExp .
     TailExp    = AddOp Term TailExp | e .
     Term       = Factor TailTerm .
     TailTerm   = MulOp Factor TailTerm | e .
     Factor     = Primary TailFactor .
     TailFactor = "^" Expression | e .
     Primary    = "a" | "b" | "c" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

The nullable nonterminals here are TailExp, TailTerm and TailFactor.

   FIRST(TailExp)    = { "+" , "-" }
   FIRST(TailTerm)   = { "*" , "/" }
   FIRST(TailFactor) = { "^" }

The FOLLOW sets are a little harder to see because to get to closure one has to chase through quite a few other productions:

   FOLLOW(TailExp)    = FOLLOW(Expression)
   FOLLOW(TailTerm)   = FOLLOW(Term) = FIRST(TailExp) U FOLLOW(Expression)
   FOLLOW(TailFactor) = FOLLOW(Factor) = FIRST(TailTerm) U FOLLOW(Term)

You are invited to track these through in detail; the outcome is that they are all the same:

   FOLLOW(TailExp)    = { "*" , "/" , "+" , "-" , EOF }
   FOLLOW(TailTerm)   = { "*" , "/" , "+" , "-" , EOF }
   FOLLOW(TailFactor) = { "*" , "/" , "+" , "-" , EOF }

and so Rule 2 is broken for TailExp and for TailTerm.

Finding an LL(1), unambigous grammar, with the correct precedence and associativity is not too difficult. In fact it would have been incredibly easy had you just studied section 6.2 of the text, where the solution is effectively given to you.

   COMPILER Expression $CNF
   IGNORE CHR(0) .. CHR(31)
   PRODUCTIONS
     Expression = Term    { ( "+" | "-" ) Term  } .
     Term       = Factor  { ( "*" | "/*" ) Factor } .
     Factor     = Primary [ "^" Factor ] .
     Primary    = "a" | "b" | "c" .
  END Expression.

Why does our grammar now satisfy the LL(1) constraints? Rewritten it becomes

     Expression = Term TailExp .
     TailExp    = AddOp Term TailExp | e .
     Term       = Factor TailTerm .
     TailTerm   = MulOp Factor TailTerm | e .
     Factor     = Primary TailFactor .
     TailFactor = "^" Factor | e .
     Primary    = "a" | "b" | "c" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

The nullable nonterminals here are still TailExp, TailTerm and TailFactor.

   FIRST(TailExp)    = { "+" , "-" }
   FIRST(TailTerm)   = { "*" , "/" }
   FIRST(TailFactor) = { "^" }

The FOLLOW sets are a little harder: to get to closure one has to chase through quite a few other productions:

   FOLLOW(TailExp)    = FOLLOW(Expression)
   FOLLOW(TailTerm)   = FOLLOW(Term) = FIRST(TailExp) U FOLLOW(Expression)
   FOLLOW(TailFactor) = FOLLOW(Factor) = FIRST(TailTerm) U FOLLOW(Term)

You are invited to track these through in detail; the outcome is:

   FOLLOW(TailExp)    = { EOF }
   FOLLOW(TailTerm)   = { "+" , "-" , EOF }
   FOLLOW(TailFactor) = { "*" , "/" , "+" , "-" , EOF }

and so Rule 2 is no longer broken.

There were various other suggestions made, such as

     Factor     = Primary [ "^" Term ] .
     Factor     = Primary [ "^" "(" Expression ")" ] .
     Factor     = Primary { "^" Term } .

but these are unnecessarily restrictive (first suggestion) or non-equivalent (second suggestion; parentheses were not catered for in the first grammar and introducing them is "cheating"). The third suggestion gets the associativity incorrect.

3. Consider the following (silly) grammar

    Goal = { A } [ B ] { A C } Goal .
    A    = x | y .
    B    = p | q .
    C    = [ a | b ] .

(a) Write an equivalent grammar that does not use any of the meta-brackets. [3]

Here is one possibility

    Goal = A1 B1 AC1 Goal .
    A1   = A A1 | e .
    A    = x | y .
    B1   = B | e .
    B    = p | q .
    AC1  = A C AC1 | e .
    C    = a | b | e .

(b) For your equivalent grammar, indicate which are the nullable non-terminals. [2]

      Goal   A1   B1   AC1   C   (that is, all except A and B)

4. Consider the following grammar expressed in EBNF for describing the progress of an excellent compiler course offered at a University Near You:

Course = Introduction Section { Section } Conclusion .
Introduction = [ "lecture" "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(a) Is this an LL(1) grammar? If not, where does it break the rules? [3]

Most people could see that this was not an LL(1) grammer. Introduction is nullable, and

FIRST(Introduction) = { "lecture" },

while

FOLLOW(Introduction) = { "lecture"  , "practical" , "tutorial" , "handout" , "test" }.

Note that there is no problem with the nullable component of Section, or with the nullable component withing Conclusion. Some students said that because a Section could start with "practical" "test" and be followed by "test" that the grammar was non LL(1), but this is not true. Be careful where and how you test Rule 2.

(b) If it breaks the rules, do you suppose a recursive descent parser would have trouble recognizing the history of a course? Explain. [2]

A point I wish to stress here is that a recursive descent parser can handle some non-LL(1) grammars sensibly. The most obvious example of this is the "dangling else" grammar (go and read that part of the text again!). In this particular case, however, there would be trouble. If the string to be parsed began with "lecture" the parser would not know which route to follow unless it could look ahead to the next token - and even this would not help if the next token were "handout".

To take this further, suppose the grammar had been a very similar one:

Course = Introduction Section { Section } Conclusion .
Introduction = "lecture" [ "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

This is also non-LL(1), as the nullable component within Introduction has the FIRST set { "handout" } which has "handout" in common with the FOLLOW set. However a parser would have no trouble parsing a string like

"lecture" "handout" "lecture" ...

as it would simply associate the handout with an Introduction, and not with a following Section (this is completely analogous with the dangling else problem, of course).

(c) Do you think it is possible to complete the course described by this grammar without attending any lectures? (Explain your reasoning.) [2]

Yes, indeed. A string like

"test" "examination"

is the minimum one that is acceptable to the parser, one which would result if all the nullable options were exercised.

(d) (Bonus mark) So why are some of you missing some of my lectures? [1]

There is no agreement on the correct solution here. Probably the closest is that some people are just too lazy to attend, or have stayed up too late and over sleep. Crazy, really? I'm willing to bet that if you pay for a drink at the Rat you would drink it; since you've paid for a whole lot of drinks at the Fountain of Knowledge, why don't you drink those too? Those who submitted no solution but who are known always to attend earned the bonus mark anyway. (I don't often award marks for leaving an answer blank. It is not likely to happen again!)

5. Grammars don't get much simpler than this next one - at least to write down. It has one nonterminal A and two terminals ( and )

A = "(" A ")" A | e .

(a) Is it an LL(1) grammar? If not, why not? [4]

Lots of people got this wrong, which suggests that there is still a lot of confusion about where and how to apply tests for the LL(1) condition. The production for A has two alternatives. The FIRST sets for these two alternatives are { "(" } and the empty set F, so Rule 1 is trivially satisfied. Since A is nullable we must test Rule 2 (we only need do this if a non-terminal is nullable; people miss this point). To determine FOLLOW(A) look at the right hand sides of the productions for instances of A. This only happens in the situation where A is followed by ")", so FOLLOW(A) = { ")" } and so Rule 2 is satisfied, and the grammar is LL(1). Many people dreamt up the idea that FOLLOW(A) = { "(" , ")" } but I don't understand why.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's the language of "matched parentheses". Strings of nested parentheses are permitted, such as ( ( ( ) ) ), but so are strings like ( ) ( ) ( ), and of course combinations of these like ( ( ( ) ) ) ( ) ( ( ) ( ) ). Cute, really - amazing what a simple grammar can do. Note that the "language" cannot contain any As in the strings it produces; these can only appear in the intermediate sentential forms.

(c) Is it an ambiguous grammar? Explain your reasoning. [2]

It cannot be an ambiguous grammar - we have shown it to be an LL(1) grammar, and hence it must be both deterministic and non-ambiguous.

6. Repeat question 5 for a grammar that looks rather similar but does not have an e:

A = "(" A ")" | "(" ")" | A A .

(a) Is it an LL(1) grammar? If not, why not? [2]

This one is not LL(1). There are three alternatives for A, namely "(" A ")" and "(" ")" and AA. All three alternatives have the same FIRST set, namely { "(" } so Rule 1 is broken.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's also the language of "matched parentheses" - the two grammars are almost equivalent in that sense. Not quite equavalnet - the firts one permits a totally empty string. I'm surprised so many people did not spot this!

(c) Is it an ambiguous grammar? Explain your reasoning. [2]

Well, many fell into the trap of saying "if a grammar is non LL(1), it must be ambiguous". That is not correct, though the opposite deduction, namely that "if a grammar is LL(1), it cannot be ambiguous", always holds (funny stuff, logic!). You have seen several examples of non-LL(1) grammars that are non ambiguous, for example the one for Expressions in Chapter 6:

(G1)   Expression = Term | Expression "-" Term .       (1, 2)
          Term = Factor | Term "*" Factor .        (3, 4)
          Factor = "a" | "b" | "c" .        (5, 6, 7)
          Factor = "a" | "b" | "c" .       (5, 6, 7)

However, in this case the grammar for matched parentheses is ambiguous. To convince a reader, it is necessary only to find one example of a string that can be parsed in two different ways and show the two ways. ()()() is such a string. This would come about from a sentential form AAA, but this form AAA could have been reached in two ways

A  = AA   = AA   A

or

A  = AA   = A   AA

Before we leave Questions 5 and 6, here is yet another grammar equivalent to the first one, this time a variation on the one in question 5

A = A "(" A ")" | e .

Can you see why this one is not LL(1), and also convince yourself that it is non-ambiguous?

7. Time to be more creative.

Steven Spielberg is about to film his latest blockbuster "Elevator 9/11", in which a group of gorgeous Americans (guess what) are trapped in an elevator hundreds of feet above the ground by a group of ugly ... (you guessed right again). He wants to make sure he understands how to arrange for the elevator to get stuck. (So would you if you had the chance to spend a few hours opposite/underneath/on top of some of the hunks and bimbos he has lined up to act for him).

Mr Spielberg has been reliably informed that some high-rise skyscrapers have lifts (elevators) that are based partway up the building, say on level (floor) X. People entering one of these elevators have a choice of pushing an UP key or a DOWN key as many times as they like. If the lift is behaving normally it can go up or down one level for each push of the appropriate key, but it cannot drop below level X, and at the end of the day it is supposed to have returned to level X again. Write down the productions for a simple grammar that will allow Mr S to recognize a sequence of commands that does not trap the lift, such as

UP UP DOWN DOWN or UP UP DOWN UP DOWN UP DOWN DOWN

Sequences that effectively jam the lift, such as

DOWN UP or UP UP DOWN

should be rejected by the grammar. (Hint: Treat UP and DOWN as key words <grin>)

[4]

Oh what fun! This one really sorted you out. As far as I could make out, only two students managed to see through all the hot air to the underlying simplicity. The keypresses have to balance in the same way that the parentheses in the previous questions had to balance, so all it needs is the same grammar, essentially, as for matched parentheses, save that you use UP and DOWN in place of ( and ):

NoJam = "UP" NoJam "DOWN" NoJam | e .

Solutions submiited on the lines of

NoJam = "UP" { NoJam } "DOWN" | "DOWN" { NoJam } "UP" .

were no use, of course, because you were not allowed to drop below the initial level. Solutions like

NoJam = "UP" { "UP" "DOWN" } "DOWN" .

were submitted by some students. This certainly matches the two correct examples suggested, but these people fell into the trap of defining a grammar that only described a subset of the language, and not the complete language.

A grammar like

NoJam = "UP" { NoJam } "DOWN" .

at first looks promising. But it cannot describe a valid sequence like

UP UP DOWN UP DOWN UP DOWN DOWN .


Home  © P.D. Terry