Prac 22 Test

Computer Science 3 - 2004 - Test on Prac 22 - Solutions

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood the material in the prac questions. You may use your textbook if you think it will help. Answer on the questions sheet (4 sides).

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. A week or so back you worked with this Cocol grammar:

   COMPILER Calc  $CN
   /* Simple four function calculator
      P.D. Terry, Rhodes University, 2004 */

   CHARACTERS
     digit      = "0123456789" .
     hexdigit   = digit + "ABCDEF" .

   TOKENS
     decNumber  = digit { digit } .
     hexNumber  = "$" hexdigit { hexdigit } .

   IGNORE CHR(0) .. CHR(31)

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  |  "-" Term } .
     Term       = Factor { "*" Factor |  "/" Factor } .
     Factor     = decNumber | hexNumber .
   END Calc.

"Numbers" here could be simple decimal integers like 1234 or hexadecimal numbers like $DEAD.

Suppose we wanted a calculator that could recognize decimal numbers with an optional decimal point, as in

1232 12.34 0.34 .34 12.

and hexadecimal integer numbers using another common notation that insists on a trailing H, exemplified by

03FH 0BACH

How would you modify the ATG file? (Hint: You should know this, but I suppose some of you might not spot it in the heat of the moment. You only have to modify the TOKENS section, not the PRODUCTIONS section, so don't waste time on the PRODUCTIONS section!) [4]

The problem here is to make sure that all the valid combinations are allowed, unambiguously, insisting that at least one digit appear, and with no possibility of obtaining several periods in one number. So attempts like the following are wrong

   TOKENS
     decNumber = digit { digit } [ "." ] { digit }  .         (* ambiguous *)
     decNumber = [ "." ] digit { digit } [ "." { digit } ]  . (* can get 2 "." *)
     decNumber = digit { digit | "." } .                      (* can get 2 "." *)
     decNumber = { digit } [ "." { digit } ] .                (* nullable *)
     decNumber = { digit } [ "." ] { digit } .                (* 3rd time still unlucky - grin! *)

Here is a better one:

decNumber = digit { digit } [ "." { digit } ] | "." digit { digit } .

For the hexadecimal numbers, care must be taken not to allow the token to be confused with an identifier. Okay, so you might argue that identifiers did not really come into this one, but something like this

hexNumber = hexDigit { hexDigit } "H" ,

would allow, for example BACH or EACH. Much better definitions would be either of the following. The trailing H would allow the "semantic" part of the system to convert the string using base 16 rather than base 10.

hexNumber = digit { hexDigit } "H" . hexNumber = "0" hexDigit { hexDigit } "H" .

2. A very quick and dirty solution to the stack assembler grammar was picked out of the rubbish bin early this morning. The PRODUCTIONS section read

   PRODUCTIONS
     Program = { Statement } .
     Statement = [ [ "{" ] number [ "}" ] ] Opcode [ number | string | word ] EOL .
     Opcode = "LDA" | "LDV" | "BZE" | (* all the rest were there ... *)
              "STO" | "INPI" | "PRNS" .
   END Program.

What critical message do you think the marker would have scribbled on this, if it had been handed in? (Assume that the TOKENS, COMMENTS, CHARACTERS and IGNORE sections are correct - focus your attention simply on the PRODUCTIONS as given here). [3]

Clearly this one, at least, had been encountered in the practical itself. The worst problem is that no distinction is drawn between the various classes of opcodes, and this is so easily specified that it should have been done. A lesser problem is that the optional numbers at the start of a line could have appeared between unbalanced braces. Lastly, the grammar did not allow for statements to be properly labelled.

Here is one better solution; there are plenty more like it.

      PRODUCTIONS
        Program     = { [ Statement ] EOL } .
        Statement   = [ CodeAddress ] [ word ] Operation .
        Operation   =   Opcode1
                      | Opcode2 number
                      | Opcode3 ( number | word )
                      | Opcode4 string .
        Opcode1     = "STO" | "INPI" | "LDV" (* all the ones like this *) .
        Opcode2     = "LDC" | "LDA"  (* all the ones like that *) .
        Opcode3     = ( "BZE" | "BRN" ) ( number | word ) .
        Opcode4     = "PRNS" .
        CodeAddress = number | "[" number "]" .
      END Program.

3. Consider the following grammar expressed in EBNF for describing the progress of an excellent compiler course offered at a University Near You:

Course = Introduction Section { Section } Conclusion .
Introduction = [ "lecture" "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(a) Is this an LL(1) grammar? If not, where does it break the rules? [3]

Most people could see that this was not an LL(1) grammer. Introduction is nullable, and

FIRST(Introduction) = { "lecture" },

while

FOLLOW(Introduction) = { "lecture" , "practical", "tutorial" , "handout" , "test" }.

Note that there is no problem with the nullable component of Section, or with the nullable component withing Conclusion. Some students said that because a Section could start with "practical" "test" and be followed by "test" that the grammar was non LL(1), but this is not true. Be careful where and how you test Rule 2.

(b) If it breaks the rules, do you suppose a recursive descent parser would have trouble recognizing the history of a course? Explain. [2]

A point I wish to stress here is that a recursive descent parser can handle some non-LL(1) grammars sensibly. The most obvious example of this is the "dangling else" grammar (go and read that part of the text again!). In this particular case, however, there would be trouble. If the string to be parsed began with "lecture" the parser would not know which route to follow unless it could look ahead to the next token - and even this would not help if the next token were "handout".

To take this further, suppose the grammar had been a very similar one:

Course = Introduction Section { Section } Conclusion .
Introduction = "lecture" [ "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

This is also non-LL(1), as the nullable component within Introduction has the FIRST set { "handout" } which has "handout" in common with the FOLLOW set. However a parser would have no trouble parsing a string like

"lecture" "handout" "lecture" ...

as it would simply associate the handout with an Introduction, and not with a following Section (this is completely analogous with the dangling else problem, of course).

(c) Do you think it is possible to complete the course described by this grammar without attending any lectures? (Explain your reasoning.) [2]

Yes, indeed. A string like

"test" "examination"

is the minimum one that is acceptable to the parser, one which would result if all the nullable options were exercised.

(d) (Bonus mark) So why are some of you missing some of my lectures? [1]

There is no agreement on the correct solution here. Probably the closest is that some people are just too lazy to attend, or have stayed up too late and over sleep. Crazy, really? I'm willing to bet that if you pay for a drink at the Rat you would drink it; since you've paid for a whole lot of drinks at the Fountain of Knowledge, why don't you drink those too? Those who submitted no solution but who are known always to attend earned the bonus mark anyway. this must be the first time I have awarded marks for leaving an answer blank. It is not likely to happen again!

5. Grammars don't get much simpler than this next one - at least to write down. It has one nonterminal A and two terminals ( and )

A = "(" A ")" A | e .

(a) Is it an LL(1) grammar? If not, why not? [4]

Lots of people got this wrong, which suggests that there is still a lot of confusion about where and how to apply tests for the LL(1) condition. The production for A has two alternatives. The FIRST sets for these two alternatives are { "(" } and the empty set F, so Rule 1 is trivially satisfied. Since A is nullable we must test Rule 2 (we only need do this if a non-terminal is nullable; people miss this point). To determine FOLLOW(A) look at the right hand sides of the productions for instances of A. This only happens in the situation where A is followed by ")", so FOLLOW(A) = { ")" } and so Rule 2 is satisfied, and the grammar is LL(1). Many people dreamt up the idea that FOLLOW(A) = { "(" , ")" } but I don't understand why.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's the language of "matched parentheses". Strings of nested parentheses are permitted, such as ( ( ( ) ) ), but so are strings like ( ) ( ) ( ), and of course combinations of these like ( ( ( ) ) ) ( ) ( ( ) ( ) ). Cute, really - amazing what a simple grammar can do. Note that the "language" cannot contain any As in the strings it produces; these can only appear in the intermediate sentential forms.

It cannot be an ambiguous grammar - we have shown it to be an LL(1) grammar, and hence it must be both deterministic and non-ambiguous.

6. Repeat question 5 for a grammar that looks rather similar but does not have an e:

A = "(" A ")" | "(" ")" | A A .

(a) Is it an LL(1) grammar? If not, why not? [2]

This one is not LL(1). There are three alternatives for A, namely "(" A ")" and "(" ")" and AA. All three alternatives have the same FIRST set, namely { "(" } so Rule 1 is broken.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's also the language of "matched parentheses" - the two grammars are equivalent in that sense. I'm surprised so many people did not spot this!

Well, many fell into the trap of saying "if a grammar is non LL(1), it must be ambiguous". That is not correct, though the opposite deduction, namely that "if a grammar is LL(1), it cannot be ambiguous", always holds (funny stuff, logic!). You have seen several examples of non-LL(1) grammars that are non ambiguous, for example the one for Expressions in Chapter 6:

(G1) Expression = Term | Expression "-" Term .       (1, 2)
          Term = Factor | Term "*" Factor .        (3, 4)
          Factor = "a" | "b" | "c" .        (5, 6, 7)
          Factor = "a" | "b" | "c" .       (5, 6, 7)

However, in this case the grammar for matched parentheses is ambiguous. To convince a reader, it is necessary only to find one example of a string that can be parsed in two different ways and show the two ways. ()()() is such a string. This would come about from a sentential form AAA, but this form AAA could have been reached in two ways

A = AA = AA A

or

A = AA = A AA

Before we leave Questions 5 and 6, here is yet another equivalent grammar, this time a variation on the one in question 5

A = A "(" A ")" | e .

Can you see why this one is not LL(1), and also convince yourself that it is non-ambiguous?

7. Time to be more creative.

Steven Spielberg is about to film his latest blockbuster "Elevator 9/11", in which a group of gorgeous Americans (guess what) are trapped in an elevator hundreds of feet above the ground by a group of ugly ... (you guessed right again). He wants to make sure he understands how to arrange for the elevator to get stuck. (So would you if you had the chance to spend a few hours opposite/underneath/on top of some of the hunks and bimbos he has lined up to act for him).

Mr Spielberg has been reliably informed that some high-rise skyscrapers have lifts (elevators) that are based partway up the building, say on level (floor) X. People entering one of these elevators have a choice of pushing an UP key or a DOWN key as many times as they like. If the lift is behaving normally it can go up or down one level for each push of the appropriate key, but it cannot drop below level X, and at the end of the day it is supposed to have returned to level X again. Write down the productions for a simple grammar that will allow Mr S to recognize a sequence of commands that does not trap the lift, such as

UP UP DOWN DOWN or UP UP DOWN UP DOWN UP DOWN DOWN

Sequences that effectively jam the lift, such as

DOWN UP or UP UP DOWN

should be rejected by the grammar. (Hint: Treat UP and DOWN as key words <grin>)

[4]

Oh what fun! This one really sorted you out. As far as I could make out, only two students managed to see through all the hot air to the underlying simplicity. The keypresses have to balance in the same way that the parentheses in the previous questions had to balance, so all it needs is the same grammar, essentially, as for matched parentheses, save that you use UP and DOWN in place of ( and ):

NoJam = "UP" NoJam "DOWN" NoJam | e .

Solutions submiited on the lines of

NoJam = "UP" { NoJam } "DOWN" | "DOWN" { NoJam } "UP" .

were no use, of course, because you were not allowed to drop below the inital level. Solutions like

NoJam = "UP" { "UP" "DOWN" } "DOWN" .

were submitted by some students. This certainly matches the two correct examples suggested, but these people fell into the trap of defining a grammar that only described a subset of the language, and not the complete language.

Home © P.D. Terry