Test for week 4

Computer Science 3 - 2017 - Test on Week 4 - Solutions

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood LL(1) grammars and the like. YOU MAY NOT USE YOUR TEXTBOOK, NOTES or COMPUTER. Answer on the questions sheet (4 sides).

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. Suppose you are asked to extend the description of Parva to allow a programmer to incorporate "inline" PVM assembler statements, as exemplified by

           if (x > y)
             pvm {
               PRNS  "Assigning -9 to variable 2"
               LDA   2
               LDC   -9
               STO
            }

Here is a new statement definition that attempts to allow this (you would obviously also need a trivial change to the production for Statement to be able to make this into a "useful production").

      InlineStatement
      = "pvm" "{"
          { identifier [ "+" | "-" ] [ number | stringLit ] }
        "}" .

Does this production really describe the new statement type properly? If not, suggest how it should be improved by giving a better production or productions. You don't have to remember all the PVM opcodes, just use a few of them if you need to make a particular point. [6]

This was surprisingly badly done. Just how far had you got with the last practical exercise, which really posed much the same problem? I had hoped that everyone would have seen that this was far too "permissive" - not all identifiers are valid opcode mnemonics, and not all opcodes can take arbitrary parameters, nor is it legitimate to write statements like

LDC +
PRNS - "how long is a piece of string?"

What I had hoped to get was something on the lines of the following:

InlineStatement = "pvm" "{" { Code } "}" . Code = OneWord | TwoWord | PrintString . OneWord = "NOP" | "NEG" | "ADD" | "SUB" | "MUL" | "DIV" | "REM" | "AND" | "OR" | "NOT" | "ANEW" | "LDXA" | "LDV" | "HALT" | "CEQ" | "CNE" | "CLT" | "CLE" | "CGT" | "CGE" | "STK" | "STO" | "INPI" | "INPB" | "PRNI" | "PRNB" | "PRNL" . TwoWord = ( "LDC" | "LDA" | "DSP" | "BRN" | "BZE" ) [ "+" | "-" ] number . PrintString = "PRNS" stringLit .

Some years ago when I set this exercise, one perceptive person pointed out, the ability to have BRN and BZE statements would be very nice, but the system would be almost totally unusable, since computing the target addresses would be a nightmare. We would need to have some sort of label system, as defined perhaps by the grammar below.

I had not thought of this complication when I set the problem but he or she was quite correct. As I have said before, every year at least one student comes up with something that I have not thought of, and for which I am grateful. That is really why I go to all my lectures - I learn so much in them (why not try that for yourselves?).

        InlineStatement = "pvm" "{" { Code } "}" .
        Code            = [ Label ] ( OneWord | TwoWord | Branch | PrintString ) .

        OneWord         =   "NOP" | "NEG"  | "ADD"  | "SUB"  | "MUL"  | "DIV" | "REM"
                          | "AND" | "OR"   | "NOT"  | "ANEW" | "LDXA" | "LDV" | "HALT"
                          | "CEQ" | "CNE"  | "CLT"  | "CLE"  | "CGT"  | "CGE" | "STK"
                          | "STO" | "INPI" | "INPB" | "PRNI" | "PRNB" | "PRNL" .

        TwoWord         = ( "LDC" | "LDA"  | "DSP" ) [ "+" | "-" ] number .

        Branch          = ( "BRN"  | "BZE" ) ( Label | number )

        PrintString     = "PRNS" stringLit .

        Label           = identifier .

3. Consider the following (silly) grammar

    Goal = { A } [ C ] { A C } Goal .
    A    = x | y .
    B    = p | q .
    C    = [ a | b ] .

(a) Write an equivalent grammar that does not use any of the meta-brackets (you may introduce further non-terminals if you wish, but your grammar should retain the non-terminals already present). [3]

Here is one possibility. Keep it simple - introduce a new non-terminal for each of the sections in metabrackets and use right recursion to help keep it LL(1)

    Goal  = SeqA OptC SeqAC Goal .
    SeqA  = A SeqA | ε .
    A     = x | y .
    OptC  = C | ε .
    B     = p | q .
    SeqAC = A C SeqAC | ε .
    C     = a | b | ε .

(b) For your equivalent grammar, indicate which are nullable non-terminals? [2]

      Goal   SeqA   OptC   SeqAC   C   (that is, all except A and B.  B of course is unreachable.)

(c) There is an easy way to see that the original grammar is ambiguous. Why is it ambiguous, and how could a simple change remove this ambiguity? [2]

The component [ C ] in the first production makes for an effectively optional part [ [ a | b ] ]. Just which ε would you like? You can't have both and should remove one of them.

4. Grammars don't get much simpler than this next one - at least to write down. It has one nonterminal A and two terminals, the parentheses ( and ).

A = "(" A ")" A | ε .

(a) Is it an LL(1) grammar? If not, why not? [4]

Too many people got this wrong, which suggests that there is still a lot of confusion about where and how to apply tests for the LL(1) condition. The production for A has two alternatives. The FIRST sets for these two alternatives are { "(" } and the empty set Φ, so Rule 1 is trivially satisfied. Since A is nullable we must test Rule 2 (we only need do this if a non-terminal is nullable; people miss this point). To determine FOLLOW(A) look at the right hand sides of the productions for instances of A. This only happens in the situation where A is followed by ")", so FOLLOW(A) = { ")" } and so Rule 2 is satisfied, and the grammar is LL(1). Some people dreamt up the idea that FOLLOW(A) = { "(" , ")" } but I don't understand why.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's the language of "matched parentheses". Strings of nested parentheses are permitted, such as ( ( ( ) ) ), but so are strings like ( ) ( ) ( ), and of course combinations of these like ( ( ( ) ) ) ( ) ( ( ) ( ) ). Cute, really - amazing what a simple grammar can do. Note that the "language" cannot contain any As in the strings it produces; these can only appear in the intermediate sentential forms.

It cannot be an ambiguous grammar - we have shown it to be an LL(1) grammar, and hence it must be both deterministic and non-ambiguous.

5. Repeat question 4 for a grammar that looks rather similar but does not have an ε:

A = "(" A ")" | "(" ")" | A A .

(a) Is it an LL(1) grammar? If not, why not? [2]

This one is not LL(1). There are three alternatives for A, namely "(" A ")" and "(" ")" and AA. All three alternatives have the same FIRST set, namely { "(" } so Rule 1 is broken.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's also a language of "matched parentheses" - the two grammars are almost equivalent in that sense. Not quite equivalent - the first one permits a totally empty string. I'm surprised so many people did not spot this, or at least comment on it.

Well, many fell into the trap of saying "if a grammar is non LL(1), it must be ambiguous". That is not correct, though the opposite deduction, namely that "if a grammar is LL(1), it cannot be ambiguous", always holds (funny stuff, logic!). You have seen several examples of non-LL(1) grammars that are non-ambiguous, for example the left recursive one for Expressions in Chapter 6:

(G1) Expression = Term | Expression "-" Term .       (1, 2)
          Term = Factor | Term "*" Factor .        (3, 4)
          Factor = "a" | "b" | "c" .        (5, 6, 7)

However, in this case the grammar for matched parentheses is ambiguous. To convince a reader, it is necessary only to find one example of a string that can be parsed in two different ways and show the two ways. ()()() is such a string. This would come about from a sentential form AAA, but this form AAA could have been reached in two ways

A = AA = AA A

or

A = AA = A AA

Before we leave Questions 4 and 5, here is yet another grammar equivalent to the first one, this time a variation on the one in question 4

A = A "(" A ")" | ε .

Can you see why this one is not LL(1), and also convince yourself that it is non-ambiguous?

6. Steven Spielberg is about to film his latest blockbuster "Elevator 9/11", in which a group of gorgeous Americans (guess what) are trapped in an elevator hundreds of feet above the ground by a group of ugly ... (you guessed right again). He wants to make sure he understands how to arrange for the elevator to get stuck. (So would you if you had the chance to spend a few hours opposite/underneath/on top of some of the hunks and bimbos he has lined up to act for him).

Mr Spielberg has been reliably informed that some high-rise skyscrapers have lifts (elevators) that are based partway up the building, say on level (floor) X. People entering one of these elevators have a choice of pushing an UP key or a DOWN key as many times as they like. If the lift is behaving normally it can go up or down one level for each push of the appropriate key, but it cannot drop below level X, and at the end of the day it is supposed to have returned to level X again. Write down the productions for a simple grammar that will allow Mr S to recognize a sequence of commands that does not trap the lift, such as

UP UP DOWN DOWN or UP UP DOWN UP DOWN UP DOWN DOWN

Sequences that effectively jam the lift, such as

DOWN UP or UP UP DOWN

should be rejected by the grammar. (Hint: Treat UP and DOWN as key words <grin>) [4]

As far as I can remember, only two students managed to see through all the hot air to the underlying simplicity. The keypresses have to balance in the same way that the parentheses in the previous questions had to balance, so all it needs is the same grammar, essentially, as for matched parentheses, save that you use UP and DOWN in place of ( and ):

NoJam = "UP" NoJam "DOWN" NoJam | ε .

Solutions submitted on the lines of

NoJam = "UP" { NoJam } "DOWN" | "DOWN" { NoJam } "UP" .

were no use, of course, because you were not allowed to drop below the initial level. Solutions like

NoJam = "UP" { "UP" "DOWN" } "DOWN" .

were submitted by some students. This certainly matches the two correct examples suggested, but these people fell into the trap of defining a grammar that only described a subset of the language, and not the complete language.
A grammar like

NoJam = "UP" { NoJam } "DOWN" .

at first looks promising. But it cannot describe a valid sequence like

UP UP DOWN UP DOWN UP DOWN DOWN .

As I have often said (and doubtless will often say again) I am always delighted when students come up with neat solutions to my problems that I had not thought of first. While still wondering why the inventors of these solutions had not seen what to me was the obvious solution above, two correct solutions were the following:

NoJam = "UP" [ NoJam ] "DOWN" [ NoJam ] .

NoJam = { HaveFun } .
HaveFun = "UP" NoJam "DOWN" .

Finally, several students submitted

NoJam = "UP" NoJam "DOWN" | "UP" "DOWN" | NoJam NoJam .

which intrigued me even more. This is the ambiguous grammar of question 5 - but why use an ambiguous grammar when you don't have to?

6. Who said that practicals were all plain sailing? Here is an attempt to describe the progress of one of my practicals. This one has only two tasks, which can be attempted in either order after the group has studied the handout and installed the prac kit. Under the guidance of your well-known tutors, each group works away attempting to solve one problem at a time and, at each attempt, may seek help from the pool of tutors. Based on the reputation and skill of the tutors it has been suggested that the scenario can be described as follows, regarding the quoted words as terminals for this purpose.

    Practical    = "PracTest" GetOrganised Tasks .
    GetOrganised = "Handout" "PracKit" .
    Tasks        = ( Task1 | Task2 ) "Supper" ( Task2 | Task1 ) .
    Task1        = { "Try1" { Help } } "Submit1" .
    Task2        = { "Try2" { Help } } "Submit2" .
    Help         = "Marq" [ "Damon"] | "Damon" [ "Marq" "Kefentse" ] | "Kefentse" | "Pat" .

(a) Some groups prefer to install the prac kit before reading the handout, but this grammar does not allow that. What would you have to change to permit this freedom (and that means "make the change, not waffle!")? And, of course, some of those students don't bother with getting the handout in any case, I have noticed. Modify the grammar to accept their behaviour as well. [4]

We need change only the production for GetOrganised. This does not introduce any LL(1) complications.

    GetOrganised = "Handout" "PracKit" | "PracKit" [ "HandOut" ] .

(b) The grammar has been written to try to specify that the two tasks can be done in either order. Does it succeed, or does it need further modification? [3]

It most certainly does not! The present production for Tasks insists on two tasks, but would accept any of the four sequences Task1 Supper Task2, Task2 Supper Task1, Task1 Supper Task1 and Task2 Supper Task2, this allowing one task to be avoided altogether. What is required is a replacement:

    Tasks  =  Task1 "Supper"  Task2   |   Task2 "Supper" Task1 .

(c) While there is clearly a lot of Help to be had, the production for acquiring it seems fairly complicated. Is this an LL(1) grammar? Can you justify your answer? Is it perhaps an ambiguous grammar? Can you justify your answer? Can you simplify the process? [6]

To check whether it is an LL(1) grammar we need to check on productions with alternatives. The suggested productions for Tasks clearly satisfy this rule, since FIRST(Task1) = { "Try1" , "Submit1" } and similarly FIRST(Task2) = { "Try2" , "Submit2" }.

The production for Task1 has a nullable section { "Try1" { Help } }, For this one the FIRST set is { "Try1" } and the FOLLOW set is { "Submit1" } so Rule 2 holds. There is an internal nullable section { Help }, and for this the FIRST set is { "Marq", "Damon", "Kefentse", "Pat" } while the FOLLOW set is { "Submit1", "Try1" } and again Rule 2 is satisfied.

The production for Help itself is more interesting. The four alternatives clearly satisfy Rule 1, and the FIRST sets are all singleton sets with a unique tutor name each. But since Help is contained within braces, we must examine FIRST( [ "Damon" ] ) and FOLLOW( [ "Damon" ] ). These are the sets { "Damon" } and { "Damon", "Marq", "Kefentse", "Pat" } which clearly spoils Rule 2. The grammar is ambiguous as well - the sequence "Damon" "Marq" "Pat" could be derived with two repetitions of Help - {Damon Marq) (Pat) or with three repetitions - (Damon []) (Marq []) (Pat).

Of course we can get the syntax we require with a much simpler production for Help:

    Help  =  "Marq" | "Damon" | "Kefentse" | "Pat" .

although it might be argued that this does not capture the (suggested) cooperation of Marq with Damon, Marq with Kefentse, and the isolation of Pat, which maybe one was trying to capture in the original grammar.

(d) Is it possible to do practicals as described here without any help and/or without even trying? How? Why do some of you ask for so little help? [3]

Indeed, yes. The productions for Task1 and Task2 both afford the chance of incorporating only Submit1 and Submit2. You can submit without trying, but you cannot get help without trying - at least as decreed by the grammar.

As for the other issue - well, I don't know either. I am nearly always ready to give help, and enjoy doing so.