Test for week 22

Computer Science 3 - 2012 - Test on Week 22 - Solutions

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood LL(1) grammars and the like. YOU MAY NOT USE YOUR TEXTBOOK, NOTES or COMPUTER. Answer on the questions sheet (4 sides).

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. A chance to redeem yourself after last week, perhaps?

(a) Show how one might specify a numeric literal in Cocol, allowing for the possibility of numbers of any of the following forms [4]

      123    1.23    1.    .23    1.2E12    1.2E-12   1E6    1E-6    .234E6   .234E-6  1.2E+67

(b) Many programming languages allow identifiers to start with a letter and go on to contain a sequence of letters and digits and underscores (as exemplified by My_Module_Name, but subject to a restraint that the identifier cannot end with an underscore (so BadName_ would not be permited). Show how this could be done in Cocol. [3]

    CHARACTERS
      digit  = "0123456789" .
      letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

    TOKENS
      number =   "." digit { digit } [ ( "e" | "E" ) [ "+" | "-" ] digit { digit } ]
               | digit { digit } [ "." { digit } ] [ ( "e" | "E" ) ["+" | "-"] digit { digit } ] .

      identifier = letter { letter | digit | "_" { "_" } (letter | digit } .

or

      identifier = letter { { "_" } ( letter | digit ) } .

Remember that you cannot define one token in terms of others.

Notice that the definition of identifier allows for several underscores to appear in sequence. Several submissions omitted the optional { "_" } component.

3. Consider the following (silly) grammar

    Goal = { A } [ B ] { A C } Goal .
    A    = x | y .
    B    = p | q .
    C    = [ a | b ] .

(a) Write an equivalent grammar that does not use any of the meta-brackets. [3]

Here is one possibility

    Goal = A1 B1 AC1 Goal .
    A1   = A A1 | .
    A    = x | y .
    B1   = B | .
    B    = p | q .
    AC1  = A C AC1 | .
    C    = a | b | .

(b) For your equivalent grammar, indicate which are the nullable non-terminals. [2]

      Goal   A1   B1   AC1   C   (that is, all except A and B)

4. Consider the following grammar expressed in EBNF for describing the progress of an excellent compiler course offered at a University Near You:

Course = Introduction Section { Section } Conclusion .
Introduction = [ "lecture" "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(a) Is this an LL(1) grammar? If not, where does it break the rules? [3]

Most people could see that this was not an LL(1) grammer. Introduction is nullable, and

FIRST(Introduction) = { "lecture" },

while

FOLLOW(Introduction) = { "lecture" , "practical", "tutorial" , "handout" , "test" }.

Note that there is no problem with the nullable component of Section, or with the nullable component withing Conclusion. Some students said that because a Section could start with "practical" "test" and be followed by "test" that the grammar was non LL(1), but this is not true. Be careful where and how you test Rule 2.

(b) If it breaks the rules, do you suppose a recursive descent parser would have trouble recognizing the history of a course? Explain. [2]

A point I wish to stress here is that a recursive descent parser can handle some non-LL(1) grammars sensibly. The most obvious example of this is the "dangling else" grammar (go and read that part of the text again!). In this particular case, however, there would be trouble. If the string to be parsed began with "lecture" the parser would not know which route to follow unless it could look ahead to the next token - and even this would not help if the next token were "handout".

To take this further, suppose the grammar had been a very similar one:

Course = Introduction Section { Section } Conclusion .
Introduction = "lecture" [ "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

This is also non-LL(1), as the nullable component within Introduction has the FIRST set { "handout" } which has "handout" in common with the FOLLOW set. However a parser would have no trouble parsing a string like

"lecture" "handout" "lecture" ...

as it would simply associate the handout with an Introduction, and not with a following Section (this is completely analogous with the dangling else problem, of course).

(c) Do you think it is possible to complete the course described by this grammar without attending any lectures? (Explain your reasoning.) [2]

Yes, indeed. A string like

"test" "examination"

is the minimum one that is acceptable to the parser, one which would result if all the nullable options were exercised.

(d) (Bonus mark) So why are some of you missing some of my lectures? [1]

There is no agreement on the correct solution here. Probably the closest is that some people are just too lazy to attend, or have stayed up too late and over sleep. Crazy, really? I'm willing to bet that if you pay for a drink at the Rat you would drink it; since you've paid for a whole lot of drinks at the Fountain of Knowledge, why don't you drink those too? Those who submitted no solution but who are known always to attend earned the bonus mark anyway. (I don't often award marks for leaving an answer blank. It is not likely to happen again!)

5. Grammars don't get much simpler than this next one - at least to write down. It has one nonterminal A and two terminals ( and )

A = "(" A ")" A | e .

(a) Is it an LL(1) grammar? If not, why not? [4]

Lots of people got this wrong, which suggests that there is still a lot of confusion about where and how to apply tests for the LL(1) condition. The production for A has two alternatives. The FIRST sets for these two alternatives are { "(" } and the empty set F, so Rule 1 is trivially satisfied. Since A is nullable we must test Rule 2 (we only need do this if a non-terminal is nullable; people miss this point). To determine FOLLOW(A) look at the right hand sides of the productions for instances of A. This only happens in the situation where A is followed by ")", so FOLLOW(A) = { ")" } and so Rule 2 is satisfied, and the grammar is LL(1). Many people dreamt up the idea that FOLLOW(A) = { "(" , ")" } but I don't understand why.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's the language of "matched parentheses". Strings of nested parentheses are permitted, such as ( ( ( ) ) ), but so are strings like ( ) ( ) ( ), and of course combinations of these like ( ( ( ) ) ) ( ) ( ( ) ( ) ). Cute, really - amazing what a simple grammar can do. Note that the "language" cannot contain any As in the strings it produces; these can only appear in the intermediate sentential forms.

It cannot be an ambiguous grammar - we have shown it to be an LL(1) grammar, and hence it must be both deterministic and non-ambiguous.

6. Repeat question 5 for a grammar that looks rather similar but does not have an e:

A = "(" A ")" | "(" ")" | A A .

(a) Is it an LL(1) grammar? If not, why not? [2]

This one is not LL(1). There are three alternatives for A, namely "(" A ")" and "(" ")" and AA. All three alternatives have the same FIRST set, namely { "(" } so Rule 1 is broken.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's also the language of "matched parentheses" - the two grammars are almost equivalent in that sense. Not quite equavalnet - the firts one permits a totally empty string. I'm surprised so many people did not spot this!

Well, many fell into the trap of saying "if a grammar is non LL(1), it must be ambiguous". That is not correct, though the opposite deduction, namely that "if a grammar is LL(1), it cannot be ambiguous", always holds (funny stuff, logic!). You have seen several examples of non-LL(1) grammars that are non ambiguous, for example the one for Expressions in Chapter 6:

(G1) Expression = Term | Expression "-" Term .       (1, 2)
          Term = Factor | Term "*" Factor .        (3, 4)
          Factor = "a" | "b" | "c" .        (5, 6, 7)
          Factor = "a" | "b" | "c" .       (5, 6, 7)

However, in this case the grammar for matched parentheses is ambiguous. To convince a reader, it is necessary only to find one example of a string that can be parsed in two different ways and show the two ways. ()()() is such a string. This would come about from a sentential form AAA, but this form AAA could have been reached in two ways

A = AA = AA A

or

A = AA = A AA

Before we leave Questions 5 and 6, here is yet another grammar equivalent to the first one, this time a variation on the one in question 5

A = A "(" A ")" | e .

Can you see why this one is not LL(1), and also convince yourself that it is non-ambiguous?

7. Time to be more creative.

Steven Spielberg is about to film his latest blockbuster "Elevator 9/11", in which a group of gorgeous Americans (guess what) are trapped in an elevator hundreds of feet above the ground by a group of ugly ... (you guessed right again). He wants to make sure he understands how to arrange for the elevator to get stuck. (So would you if you had the chance to spend a few hours opposite/underneath/on top of some of the hunks and bimbos he has lined up to act for him).

Mr Spielberg has been reliably informed that some high-rise skyscrapers have lifts (elevators) that are based partway up the building, say on level (floor) X. People entering one of these elevators have a choice of pushing an UP key or a DOWN key as many times as they like. If the lift is behaving normally it can go up or down one level for each push of the appropriate key, but it cannot drop below level X, and at the end of the day it is supposed to have returned to level X again. Write down the productions for a simple grammar that will allow Mr S to recognize a sequence of commands that does not trap the lift, such as

UP UP DOWN DOWN or UP UP DOWN UP DOWN UP DOWN DOWN

Sequences that effectively jam the lift, such as

DOWN UP or UP UP DOWN

should be rejected by the grammar. (Hint: Treat UP and DOWN as key words <grin>)

[4]

Oh what fun! This one really sorted you out. As far as I could make out, only two students managed to see through all the hot air to the underlying simplicity. The keypresses have to balance in the same way that the parentheses in the previous questions had to balance, so all it needs is the same grammar, essentially, as for matched parentheses, save that you use UP and DOWN in place of ( and ):

NoJam = "UP" NoJam "DOWN" NoJam | e .

Solutions submiited on the lines of

NoJam = "UP" { NoJam } "DOWN" | "DOWN" { NoJam } "UP" .

were no use, of course, because you were not allowed to drop below the initial level. Solutions like

NoJam = "UP" { "UP" "DOWN" } "DOWN" .

were submitted by some students. This certainly matches the two correct examples suggested, but these people fell into the trap of defining a grammar that only described a subset of the language, and not the complete language.
A grammar like

NoJam = "UP" { NoJam } "DOWN" .

at first looks promising. But it cannot describe a valid sequence like

UP UP DOWN UP DOWN UP DOWN DOWN .

Home © P.D. Terry