Test for week

Computer Science 3 - 2016 - Test on Week 4

This should take no longer than about 40 minutes and is simply intended to probe whether you have understood the material in the prac questions. You may use your textbook if you think it will help, but not a computer. Answer on the question sheet (4 sides).

1. Sorry, I am getting past it and have forgotten. What are your: (first name) surname student number (in that format)? [2]

(Pat) Terry 63T0844

2. Define concisely the meaning of "ambiguous grammar". [2]

An ambiguous grammar can describe at least one sentence in two different ways, that is, there exists at least one sentence which has two distinct parse trees.

3. Define concisely the conditions that must be satisfied for a grammar to earn the title "LL(1) grammar". [4]

Rule 1
For each non-terminal A_i ∈ N that admits alternatives

A_i → ξ_i1 | ξ_i2 | . . . ξ_in

the sets of initial terminal symbols of all strings that can be generated from each of the alternative ξ_ik must be disjoint, that is

FIRST(ξ_ij) ∩ FIRST(ξ_ik) = ∅ for all j ≠ k

Rule 2
For each non-terminal A_i ∈ N that admits alternatives

A_i → ξ_i1 | ξ_i2 | . . . ξ_in

but where ξ_ik ⇒ ε for some k, the sets of initial terminal symbols of all sentences that can be generated from each of the ξ_ij for j ≠ k must be disjoint from the set FOLLOW(A_i) of symbols that may follow any sequence generated from A_i, that is

FIRST(ξ_ij) ∩ FOLLOW(A_i) = ∅, j ≠ k

or, rather more loosely,

FIRST(A_i) ∩ FOLLOW(A_i) = ∅

4. Can an "LL(1) grammar" be ambiguous? [1]

5. Give an example of a simple grammar that is unambiguous, but is also non-LL(1). [3]

It is a Good Idea to have one of these committed to memory to handle this silly sort of question. There are many examples, of which the simplest may be

A = a [ B ] b.
B = b | c .

which breaks Rule 2 (but not Rule 1). There are only three possible sentences - ab , abb and acb and each can be derived in only one way.

A more sophisticated answer might be to quote the left resursive grammar for simple algebraic expressions discussed earlier in the course.

Time to stop watching videos, playing computer games, and checking FB - listen to steam radio

Consider how we might describe the daily activities of SAFM - "your news and information leader" (more realistically described as a non-stop party political broadcast). Hour by hour Bongi, Tsepiso, Nancy, Ashraf, Elvis, Sakina and their comrades present what are generously called shows. These include music and advertisements when there is nothing more interesting. What passes as "information" is provided by news bulletins, weather reports and talk shows. News bulletins consist of a homogeneous stream of stories, while talk shows consist of interchanges between a host and any listeners who are keen enough to phone in - with the announcer hosting the talk show getting the first and last word on the subject, of course.

The staff at Auckland Park have made an attempt to describe a typical day's activities as shown below. Items like "advert", "rain" and "music" are really in the category of the lexical terminals which a scanner (in the person of someone listening to the radio) would recognize as key symbols while parsing a broadcast.

     Radio        =   { TalkShow | NewsBulletin | "music" | "advert" } .
     NewsBulletin =   "advert" NewsItem { NewsItem } [ Weather ] Filler .
     NewsItem     =      "ANC" [ "Zuma" ] | "strike" | [ "Zuma" ] "ANC" | "Wayde"
                      | "corruption" | "LGE" { "LGE" } | "Oscar" | Accident .
     TalkShow     =   {"host" "listener" } "host" [ Filler ] .
     Accident     =   "another" ( number "lives" | "life" ) "lost" .
     Filler       =   "advert" | "music" .
     Weather      =   { ( "snow" | "rain" | "cloudy" | "windy" ) "in" province } .

6. Read the productions very carefully and then add a few lines of code so as to produce a complete Cocol grammar which also incorporates the directives or pragmas that request Coco/R to test the grammar and to evaluate the FIRST and FOLLOW sets for the non-terminals. [8]

Something like this would suffice. Or one couuld introduce something like

province = "WC" | "EC" | "NC" | "GP" | "FS" | KZN | "MP" | "LM" | "NC" .

to complete the TOKEN or PRODUCTIONS section.

    COMPILER Radio $TF
    CHARACTERS
      letter = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      digit  = "01234567890" .
    TOKENS
      number   = digit { digit } .
      province = letter { letter } .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Radio        = { TalkShow | NewsBulletin | "music" | "advert" }   .
      NewsBulletin = "advert" NewsItem { NewsItem } [ Weather ] Filler .
      NewsItem     =    "ANC" [ "Zuma" ] | "strike" | [ "Zuma" ] "ANC" | "Wayde"
                      | "corruption" | "LGE" { "LGE" } | "Oscar" | Accident .
      TalkShow     = {"host" "listener" } "host" [ Filler ] .
      Accident     = "another" ( number "lives" | "life" ) "lost" .
      Filler       = "advert" | "music" .
      Weather      = { ( "snow" | "rain" | "cloudy" | "windy" ) "in" province } .
    END Radio.

7. What are the sets FIRST(Weather) and FOLLOW(Weather)? [4]

FIRST( Weather ) = {"snow" , "rain" , "cloudy" , "windy" }
FOLLOW( Weather ) = {"music", "advert" }

8. As it is expressed here, which (if any) of the seven non-terminals would be regarded as "nullable"? [4]

Radio is nullable (The Radio does not have to be turned on!)
Weather is nullable

Several of the others have "nullable sections", but all derive at least one terminal when applied. It may help the discussion that follows to introduce two more non terminals - Programme and LocalWeather.

    PRODUCTIONS
      Radio        = { Programme } .
      Programme    = TalkShow | NewsBulletin | "music" | "advert" .
      NewsBulletin = "advert" NewsItem { NewsItem } [ Weather ] Filler .
      NewsItem     =    "ANC" [ "Zuma" ] | "strike" | [ "Zuma" ] "ANC" | "Wayde"
                      | "corruption" | "LGE" { "LGE" } | "Oscar" | Accident .
      TalkShow     = {"host" "listener" } "host" [ Filler ] .
      Accident     = "another" ( number "lives" | "life" ) "lost" .
      Filler       = "advert" | "music" .
      Weather      = { LocalWeather }
      LocalWeather = ( "snow" | "rain" | "cloudy" | "windy" ) "in" province .
    END Radio.

9. HM, the COO of the SABC, asked a graduate of CSC 301 to use Coco to analyse this grammar. To his horror he was told that this turned up 8 (eight) distinct "LL(1) warnings". He demanded an explanation for each one, presumably so that #HeadsMustRoll. Can you predict at least seven of these and explain them (without using Coco for yourself). [24]

Of the productions given above, those for Accident, Filler and LocalWeather clearly obey the rules.

Rule 1 is obviously broken in two places:

(a) One of the options for Programme, that is NewsBulletin, starts with "advert", which is an option in its own right for Programme.

(b) Since [ "Zuma" ] is nullable, two of the options in NewsItem can both start with "ANC"

To check Rule 2 using EBNF conventions, we need to look at the nullable sections - which may or may not contain nullable nonterminals within them.

1 Radio
2 Weather
3 { NewsItem }
4 [ Weather ]
5 [ "Zuma" ]
6 { "LGE" }
7 [ Filler ]
8 { "host" "listener" }

As mentioned in the book, if we have productions of the form

A = α [ β ] γ
D = α { β } γ

then Rule 2 essentially demands that FIRST(β) ∩ FIRST(γ) = ∅ . This is not the same as demanding that FIRST(β) ∩ FOLLOW(β) = ∅. Indeed, if the { } brackets are to be of any use in generating a sequence of several βs , then FOLLOW(β) must contain FIRST(β) (although it might also contain extra terminals by virtue of β appearing in other places on the right side of the production sets).

Several submissions fell foul of a misconception that an element like { β } must always render a grammar non-LL(1) because of this repetition.

If within a { β } or [ β ] component β itself is non-nullable and conforms to the first rule, then the second rule cannot be broken by this element itself. But if it contains other nullable components, problems might arise. Consider, for example

A = α { β [ γ ] } δ .

This could generate/expand to something like

A = α (β [ γ ]) (β [ γ ]) δ = a b [ γ ] β [ γ ] δ = a b ( [ γ ] β ) ( [ γ] δ )

and the second rule would require FIRST(γ) ∩ FIRST(β) = ∅ and also FIRST(γ) ∩ FIRST(δ) = •

There is another trap that one might easily fall into. A grammar with productions that contain components like [ [ α ] ] or [ { α } ] or { [ α ] } is ambiguous, to say the least, and Coco will reject it as well. This fault is present in the grammar above (point 4) - and was the eighth warning that Coco gave, one that I suspected few people would find for themselves.

All of which should make you even more grateful for tools like Coco!

Turning now to the grammar in the problem, a look at the productions for Radio and Weather allow for repetitions of non-nullable elements, which cause no harm. But

        TalkShow = { "host" "listener" } "host" [ Filler ] .

shows that, not only is Rule 2 broken for the nullable section { "host" "listener" }, but [ Filler ] is nullable, and this combined with the production for Programme shows that Rule 2 must be broken for this nullable component as well (items 7 and 8 above).

Of the remaining elements identified above, the nullable { NewsItem } allows for a repetition of various non-null elements and by itself seems to cause no trouble other than the Rule 1 violation already discussed. But the trailing [ "Zuma" ] and { "LGE" } components give rise to examples of the sort of issue raised above in the discussion of an element like { β [ γ ] } δ, and with a little thought you should see the Rule 2 violations quite clearly

For the record, submitting the grammar to Coco/R produced warnings like these:

Warning: Radio is deletable Warning: Weather is deletable

LL1 warning in NewsBulletin: the contents of [...] or {...} must not be deletable (Rule 2)

LL1 warning in Programme: "advert" is the start of several alternatives (Rule 1) LL1 warning in NewsItem: "ANC" is the start of several alternatives (Rule 1)

LL1 warning in TalkShow: "host" is the start & successor of a deletable structure (Rule 2) LL1 warning in TalkShow: "music" is the start & successor of a deletable structure (Rule 2) LL1 warning in TalkShow: "advert" is the start & successor of a deletable structure (Rule 2)

LL1 warning in NewsItem: "Zuma" is the start & successor of a deletable structure (Rule 2) LL1 warning in NewsItem: "LGE" is the start & successor of a deletable structure (Rule 2)

10. Being a good politician, HM hoped that all these warnings must mean that this description was in fact ambiguous. His logic might or might not be correct, but can you put him at ease and confirm his suspicion? (He will need convincing, merely saying yes or no won't satisfy him.) [3]

You were being asked to produce at least one sentence in two different ways. There were many possible examples, but it was distressing that so few people realised that this was what they had to do - and of course to demonstrate the alternative derivations. Here are the important parts of two examples

{ NewsItem } = (NewsItem ) (NewsItem) = ( "ANC" "Zuma" ) ( ε "ANC" ) = "ANC" "Zuma" "ANC"
{ NewsItem } = (NewsItem ) (NewsItem) = ( "ANC" ε ) ( "Zuma" "ANC" ) = "ANC" "Zuma" "ANC"

(Rule 1 transgression)

NewsItem ε = ( "LGE" "LGE" ) ε = "LGE" "LGE"
NewsItem NewsItem = ( "LGE" ε ) ( "LGE" ε) = "LGE" "LGE"

(Rule 2 transgression)

11, Clearly the SABC should be restructured to avoid all these potential problems. Can you write a grammar similar to the one suggested here that would pass a Coco test and would adequately describe a day's activities? REMINDER - in Terry-speak, "Can you write" doesn't just mean "answer yes or no". It means - "if yes, actually write the new set of productions" else "if no, explain exactly why not".

How, if at all, can one find a better grammar? Several submissions simply gave up at this point, but their authors often gave very specious (that is, indefensible) reasons. Just because a grammar is not LL(1) is no guarantee that you will not be able to find an equivalent LL(1) grammar - but by the same token, it is possible to find many non-LL(1) grammars that can be converted to LL(1) grammars quite easily (go and read section 7.3 of the textbook again), although in some cases this is not possible (read the discussion about the "dangling else" again).

Some of the problems in the grammar are easily overcome. If we change the production for TalkShow to

        TalkShow = "host" { "listener" "host" } .

we remove one of the Rule 2 problems, and we do not lose anything at all - if a Filler had been present it would have shown up as part of the next item.

We can remove the Rule 1 problem involving "advert" by a simple trick which is useful on many occasions - delay the factorization of the alternatives that cause problems. At the same time we remove the ambiguous nonsense generated by the element [ Weather ] :

        Radio      = { TalkShow | "music" | "advert" [ RestOfNews ] } .
        RestofNews = NewsItem { NewsItem } Weather Filler .

Conceptually a NewsBulletin in the old form - starting with an "advert" - is still required.

The Rule 2 complication with "LGE" is easily overcome by simply removing the { "LGE" } element.

How do we get the problematic "Zuma" to go away? A careful look at the production for NewsItem has revealed that the grammar is actually ambiguous - it appears we are allowed to parse a substring "ANC" "Zuma" "ANC" in two ways (either as ("ANC" "Zuma") ( ε "ANC" ) or as ("ANC" ε ) ( "Zuma" "ANC"). Now if we really want to find a grammar that is ambiguous (bearing in mind the rude remarks made in lectures that politicians love ambiguity, we might just want to do so!) we shall not be able to do this in an LL(1) compliant way. Trying to get behind the intent of the problem might suggest that we want a grammar in which it is possible to have news items on the "ANC" without mentioning "Zuma" in the same breath, but if we ever mention "Zuma" it must either have been just after mentioning "ANC", or (if we had not done so), we have no option but to mention "ANC" straight afterwards.

One way to try to do this is to write the production:

        NewsItem  =   "ANC" [ "Zuma" ] | "strike" | "Zuma" "ANC"
                    | "Wayde" | "corruption" | "Zuma" "music" | Accident .

which gets rid of the Rule 1 violation, but not the Rule 2 violation. However, this one might not be serious - it is, in fact, exactly the same situation as the "dangling else" and a practical parser would resolve it in the same way that we discuss in lectures - that is, a news item on "Zuma" immediately following "ANC" would be bound to that story on "ANC", and not to one still to follow. So, presented with

"ANC" "Zuma" "ANC" "Wayde"

would be handled as syntactically equivalent to

("ANC" "Zuma") ("ANC") ("Wayde")

and not as

("ANC") ("Zuma" "ANC") "Wayde"

Of course, in a sense, this behaviour of the parser might not be what was really wanted, and the grammars are no longer equivalent - this is a situation where, in a sense, the semantics have become "context sensitive". In the case of the "dangling else", the recursive descent parser works very well, and if one wants the opposite sort of effect, one can resort to code in { braces }.

Some suggested that the production just be written even more simply

        NewsItem  =   "ANC" | "strike" | "Zuma" |
                    | "Wayde" | "corruption" | "Oscar" | "LGE" | Accident .

but no - one could then have news bulletins where "Zuma" is mentioned without mention of the "ANC" at all or the "ANC" without mention of "Zuma". Bliss?