Practical for week 22

Computer Science 3 - 2011 - Test on Prac 22 - Solutions

This should take no longer than about 35 minutes and is simply intended to probe whether you have understood the material in the prac questions. You may use your textbook if you think it will help. Answer on the questions sheet (4 sides).

Clearly the IS project had a bad effect on some people's CS studies this week. If you are in that category, please catch up and study these solutions carefully, and if you don't understand, come and ask for help. I hope the discussion in class helped; I enjoyed it.

1. Sorry, I am getting past it and have forgotten again. What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. A very quick and dirty solution to the stack assembler grammar was picked out of the rubbish bin early this morning. The PRODUCTIONS section read

   PRODUCTIONS
     Program = { Statement } .
     Statement = [ [ "{" ] number [ "}" ] ] Opcode [ number | string | word ] EOL .
     Opcode = "LDA" | "LDV" | "BZE" | (* all the rest were there ... *)
              "STO" | "INPI" | "PRNS" .
   END Program.

What critical message do you think the marker would have scribbled on this, if it had been handed in? (Assume that the TOKENS, COMMENTS, CHARACTERS and IGNORE sections are correct - focus your attention simply on the PRODUCTIONS as given here). [3]

Clearly this one, at least, had been encountered in the practical itself. The worst problem is that no distinction is drawn between the various classes of opcode and as this is so easily specified, it should have been done. A lesser problem is that the optional numbers at the start of a line could have appeared between unbalanced braces. Lastly, the grammar did not allow for statements to be properly labelled. Some submissions remarked that the grammar allowed a totally empty program to be written. Often these grammars permit "nothing at all", perhaps more "for completeness" (as mathematicians put it); there is nothing inherently wrong and of course one could always demand

   PRODUCTIONS
     Program = Statement { Statement } .
     Statement = ...
   END Program.

if you want to insist on at least one Statement (though one statement probably still makes for a pretty useless program!).
Here is one better solution; there are plenty more like it.

PRODUCTIONS Program = { [ Statement ] EOL } . // this allows blank lines Statement = [ CodeAddress ] [ word ] Operation . Operation = Opcode1 | Opcode2 number | Opcode3 ( number | word ) | Opcode4 string . Opcode1 = "STO" | "INPI" | "LDV" (* all the ones like this *) . Opcode2 = "LDC" | "LDA" (* all the ones like that *) . Opcode3 = ( "BZE" | "BRN" ) ( number | word ) . Opcode4 = "PRNS" . CodeAddress = number | "{" number "}" . END Program.

3. Consider the following grammar expressed in EBNF for describing the progress of an excellent compiler course offered at a University Near You:

Course = Introduction Section { Section } Conclusion .
Introduction = "handout" [ "lecture" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(a) Is this an LL(1) grammar? If not, where does it break the rules? [3]

Most people seemed to see that this was not an LL(1) grammar, but did not do the analysis correctly. The component [ "lecture" ] in Introduction is nullable, and so we must look at the intersection of

FIRST( [ "lecture" ] ) = { "lecture" },

with

FOLLOW( [ "lecture" ] ) = FOLLOW(Introduction)
= FIRST(Section) È FOLLOW(Section) (as Section is also nullable)
= { "lecture" , "practical", "tutorial", "handout" } È {"test" }.

which is the singleton set { lecture }, hence rendering the whole grammar non-LL(1) .

Note that Introduction is NOT nullable (some people thought it was), and that there is no problem with the nullable component of Section, or with the nullable component within Conclusion. Some students said that because a Section could start with "practical" "test" and be followed by "test" that the grammar was non-LL(1), but this is not true. Be careful where and how you test Rule 2.

(b) If it breaks the rules, do you suppose a recursive descent parser would have trouble recognizing the history of a course? Explain. [2]

A point I wish to stress here is that a recursive descent parser can handle some non-LL(1) grammars sensibly. The most familiar example of this is the "dangling else" grammar (go and read that part of the text again!). In this particular case, too, there would be no trouble. If the string to be parsed began with "handout" "lecture" the parser would simply associate the lecture with an Introduction, and not with a following Section (this is completely analogous with the dangling else problem, come to think of it). Of course, in practice one might argue that the format or purpose of the first lecture in a course was (or was not) different from the format of subsequent lectures, but remember that we are still dealing only with simple semantics, seductively semantically suggestive though our non-terminal names might be. Of course, we can get an equivalent - but this time LL(1) - grammar if we define it as follows:

Course = Introduction Section { Section } Conclusion .
Introduction = "handout" .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(c) Do you think it is possible to complete the course described by this grammar without attending any lectures? (Explain your reasoning.) [2]

Yes, indeed. "lecture" only appears within nullable components of the grammar.

(d) What is the shortest form the course could take? [2]

"handout" "test" "examination"

is the minimum one that is acceptable to the parser, one which would result if all the nullable options were exercised.

(d) (Bonus mark) So why are a few of you missing some of my lectures? [1]

There is no agreement on the correct solution here. Probably the closest is that some people are just too lazy to attend, or have stayed up too late and over sleep. Crazy, really? I'm willing to bet that if you pay for a drink at the Rat you would drink it; since you've paid for a whole lot of drinks at the Fountain of Knowledge, why don't you drink those too? Those who submitted no solution but who are known always to attend earned the bonus mark anyway. This must be one of very few times I have awarded marks for leaving an answer blank. It may never happen again!

5. Grammars don't get much simpler than this next one - at least to write down. It has one nonterminal A and two terminals ( and )

A = "(" A ")" A | e .

(a) Is it an LL(1) grammar? If not, why not? [4]

Lots of people got this wrong, which suggests that there is still a lot of confusion about where and how to apply tests for the LL(1) condition. The production for A has two alternatives. The FIRST sets for these two alternatives are { "(" } and the empty set F, so Rule 1 is trivially satisfied. Since A is nullable we must test Rule 2 (we need do this only if a non-terminal is nullable; people miss this point). To determine FOLLOW(A) look at the right hand sides of the productions for instances of A. This occurs only in the situation where A is followed by ")", so FOLLOW(A) = { ")" } and so Rule 2 is satisfied, and the grammar is LL(1). Some people dream up the idea that FOLLOW(A) = { "(" , ")" } but I don't understand why.

(b) What language do you suppose it describes (write down a few strings derived from A and see the hopefully obvious pattern!) [3]

It's the language of "matched parentheses". Strings of nested parentheses are permitted, such as ( ( ( ) ) ), but so are strings like ( ) ( ) ( ), and of course combinations of these like ( ( ( ) ) ) ( ) ( ( ) ( ) ). Cute, really - amazing what a simple grammar can do. Note that the "language" cannot contain any As in the strings it produces; these can only appear in the intermediate sentential forms. This mistake was made by several people.

It cannot be an ambiguous grammar - we have shown it to be an LL(1) grammar, and hence it must be both deterministic and non-ambiguous. On the other hand, an equivalent grammar (in the strict sense of "equivalent", namely that it generates the same strings) would be

A = "(" A ")" A A | e .

which is non-LL(1), because for this grammar A is still nullable, with FIRST(A) = { "(" } and FOLLOW(A) = { "(" , ")" }.

Although not all non-LL(1) grammars are necessarily ambiguous, this one would be, by virtue of the fact that we have two nullable sections A next to one another. Determining whether an arbitrary complicated grammar is ambiguous might be difficult (a bit of "luck" in spotting or thinking of a rogue sentence helps!), but the presence of two nullable component in succession might be something to look out for.

5. A weary CSC 301 student has been set the task of deriving a grammar to describe a list of trains. Trains fall into three categories - passenger trains, freight trains, and mixed passenger and freight trains. A train may have one (or more) locomotives. Behind the locomotive(s) in mixed and freight trains come one or more freight trucks. A freight train is terminated by a guard's van; a mixed train on the other hand follows the freight trucks with one or more passenger coaches, the last of which should be a passenger brake van. A passenger train has no freight trucks, but only a sequence of coaches and the passenger brake van behind the locomotive(s). Freight trucks come in various forms - open trucks, fuel trucks, coal trucks, cattle trucks and cold trucks. In the interests of safety, there is a regulation to the effect that fuel trucks may not be marshalled immediately behind the locomotives, or immediately in front of a passenger coach.

After many hours of wrestling with this she has come up with the following idea (which happens to be a valid grammar to describe such a train). But has she produced an LL(1) grammar? Justify your answer in detail (don't just guess or wave your arms!) [8]

     COMPILER Train $CN
     /* Grammar for simple railway train
        P.D. Terry, Rhodes University, 2011 */

     IGNORECASE
     COMMENTS FROM "(*" TO "*)" NESTED
     IGNORE CHR(0) .. CHR(31)

     PRODUCTIONS
       Train        = LocoPart [ SafeLoad | "brake" | Passengers ] "." .   // 1
       LocoPart     = "loco" { "loco" } .                                  // 2
       Passengers   = { "coach" } "guard" .                                // 3
       SafeLoad     = SafeTruck RestSafeLoad .                             // 4
       RestSafeLoad = SafeLoad | "guard" | Passengers | SafeFuel .         // 5
       SafeFuel     = FuelTruck { FuelTruck } ( SafeLoad | "brake" ) .     // 6
       SafeTruck    = "coal" | "open" | "cattle" | "closed" .              // 7
       FuelTruck    = "fuel" .                                             // 8
     END Train.

As mentioned in class, I had mistyped the question. That means that the grammar does not, in fact, represent the solution correctly, but no harm is done in examining the LL(1) aspects of the grammar as set (later I look at the very similar grammar that I had oiriginally intended).

These problems can be solved in two ways. One is to transform the productions in meta-bracket-free equivalents. That is the safest way, but it is (a) tedious and (b) errorprone and (c) you might not end up with an exactly equvalent grammar. The other way is to look at the various components of the EBNF and consider which of these might give Rule 1 or Rule 2 problems. Let's try that here. We have to consider each of the eight productions in turn, and be careful not to confuse the issue with comparisons between sets that have nothing to do with one another!

In the grammar above, productions for the non-terminals LocoPart, Passengers, SafeLoad and FuelTruck have no components that have alternatives. So we don't need to worry about Rule 1 for them. Productions for Train, RestSafeload, SafeFuel and SafeTruck do have alternatives, so we must be careful.

For the section

SafeLoad | "brake" | Passengers

in Production 1 for Train there is no problem.

FIRST(SafeLoad) = FIRST(SafeTruck) = { "coal", "open", "cattle", "closed" "}
FIRST("brake") = { "brake" }
FIRST(Passengers) = { "coach", "guard" }

Pair-wise all three intersections of interest are empty.
For the section RestSafeLoad in Production 5 there are four alternatives: SafeLoad, "guard", Passengers and SafeFuel. The FIRST sets for these are

First(SafeLoad) = First(SafeTruck) = { "coal", "open", "cattle", "closed" }
First("guard") = { "guard" }
First(Passengers} = { "coach", "guard" }
First(SafeFuel) = First(FuelTrucl) = { "fuel" }

which shows that the grammar is non-LL(1) because "guard" can start two alternatives.
In Production 6 for SafeFuel the section

( SafeLoad | "brake" )

clearly throws up no problems. FIRST(SafeLoad) = { "coal", "open", "cattle", "closed" "} FIRST("brake") is simply { "brake" }
For Production 7 there is clearly no problem - the FIRST sets are singleton sets and no pair has the same member. Once the grammar is broken once, it is broken! For completeness we should consider the implications of Rule 2 which must be applied to the nullable sections { "loco"} in Production 2, { "coach } in Production 3, and { "fuel" } in Production 6. In detail

FIRST( { "loco" } ) = { "loco" }
FOLLOW( { "loco" } ) = { "coal", "open", "cattle", "closed", "fuel", "brake", "coach", "guard" }

FIRST( { "coach" } ) = { "coach" }
FOLLOW( { "coach" } ) = { "guard" }

FIRST( { "fuel" } ) = { "fuel" }
FOLLOW( { "fuel" } ) = FOLLOW( { FuelTruck } = FIRST(SafeLoad) È FIRST( "brake" }
= { "coal", "open", "cattle", "closed", "brake" }

none of which cause further problems.

As a sidetrack, here is the grammar I had meant to use in the question. It is almost the same - except for Production 5:

    PRODUCTIONS
      Train        = LocoPart [ SafeLoad | "brake" | Passengers ] "." .   // 1
      LocoPart     = "loco" { "loco" } .                                  // 2
      Passengers   = { "coach" } "guard" .                                // 3
      SafeLoad     = SafeTruck RestSafeLoad .                             // 4
      RestSafeLoad = SafeLoad | "brake" | Passengers | SafeFuel .         // 5
      SafeFuel     = FuelTruck { FuelTruck } ( SafeLoad | "brake" ) .     // 6
      SafeTruck    = "coal" | "open" | "cattle" | "closed" .              // 7
      FuelTruck    = "fuel" .                                             // 8
    END Train.

Let us illustrate the other way of solving the problem, by first rewriting the grammar without using the meta- brackets:

    PRODUCTIONS
      Train        = LocoPart Payload "." .                        /* 1           */
      Payload      = SafeLoad | "brake" | Passengers |  .          /* 2 3 4       */
      LocoPart     = "loco" Locos .                                /* 5           */
      Locos        = "loco" Locos | .                              /* 6 7         */
      Passengers   = Coaches "guard" .                             /* 9           */
      Coaches      = "coach" Coaches | .                           /* 9 10        */
      SafeLoad     = SafeTruck RestSafeLoad .                      /* 11          */
      RestSafeLoad = SafeLoad | "brake" | Passengers | SafeFuel .  /* 12 13 14    */
      SafeFuel     = FuelTruck Fuels FuelTail .                    /* 15          */
      FuelTail     = SafeLoad | "brake" .                          /* 16 17       */
      Fuels        = "fuel" Fuels | .                              /* 18 19       */
      SafeTruck    = "coal" | "open" | "cattle" | "closed" .       /* 20 21 22 23 */
      FuelTruck    = "fuel" .                                      /* 24          */
    END Train.

To check Rule 1 we look at the productions for Payload, RestSafeLoad, FuelTail, SafeTruck. Several of these obviously satisfy Rule 1, but for the record here are the full details:

First(Payload1) = FIRST(SafeLoad) = { "coal", "open", "cattle", "closed" }
First(Payload2) = { " guard" }
First(Payload3) = FIRST(Passengers) = { "guard", "coach" }

First(RestSafeLoad1) = FIRST(SafeLoad) = { "coal", "open", "cattle", "closed" }
First(RestSafeLoad2) = { "brake" }
First(RestSafeLoad3) = FIRST(Passengers) = { "guard", "coach" }
First(RestSafeLoad4) = First(SafeFuel) = { "fuel" }

First(FuelTail1) = FIRST(SafeLoad) = { "coal", "open", "cattle", "closed" }
First(FuelTail2) = { "brake" }

These productions all satisfy Rule 1; and the productions for SafeTruck clearly satisfy Rule 1, as the alternatives are all simple distinct terminals.
To check Rule 2 we must examine FIRST and FOLLOW sets for the nullable non-terminals Payload, Locos, Coaches, Fuels. These are easily seen to be

FIRST(Payload) = { "brake", "guard", "coach", "coal", "open", "cattle", "closed" }
FOLLOW(Payload" = { "." }

FIRST(Locos) = { "loco" }
FOLLOW(Locos) = { ".", "brake", "guard", "coach", "coal", "open", "cattle", "closed" }

FIRST(Coaches) = { "coach" }
FOLLOW(Coaches) = { "guard" }

FIRST(Fuels) = { "fuel" }
FOLLOW(Fuels) = { "brake", "coal", "open", "cattle", "closed" }

so Rule 2 is satisfied for all the nullable non-terminals in this meta-bracket-free equivalent grammar.