Solutions - Tutorial 4

Computer Science 301 - 2014

Tutorial 4 - Solutions - Practice in checking LL(1) compliance

You can find the source for these grammars on the WWW page in the file tut4.zip.

Are the following grammars LL(1) compliant? If not, why not?

    COMPILER A $TF

      PRODUCTIONS
        A = B ( "x" | "z" ) ( "w" | "z" ) B .
        B = "x" B "z" | { "y" } .

    END A.

The best way to solve these problems may be to eliminate the meta-symbols. For example:

        A = B C D B .            1
        C = "x" | "z" .          2
        D = "w" | "z" .          3
        B = "x" B "z" | E .      4
        E = ε | "y" E .          5

Of these productions 2 - 5 offer alternatives and so should be checked for transgressions of "Rule 1". 4 and 5 show that the non-terminals B and E are nullable, and so these non-terminals should be checked for transgressions of "Rule 2".

Rules 2 - 5 come through the Rule 1 test with flying colours:

     FIRST(C1) = { "x" }     FIRST(C2) = { "z" }.   Intersection is empty.
     FIRST(D1) = { "w" }     FIRST(D2) = { "z" }.   Intersection is empty.
     FIRST(B1) = { "x" }     FIRST(B2) = { "y" }.   Intersection is empty.
     FIRST(E1) =  Empty      FIRST(E2) = { "y" }.   Intersection is empty.

For the nullable non-terminals

     FIRST(B) = { "x", "y" }    FOLLOW(B) = FIRST(C) U { "z" } = { "x", "z" }

so Rule 2 is broken for B.

     FIRST(E) = { "y" }         FOLLOW(E) = FOLLOW(B) = { "x", "z" }

so Rule 2 is satisfied for E.

   COMPILER S $FT

     PRODUCTIONS
       S = A | B .
       A = C D | E "f" .
       B = "d" | "e" .
       C = [ "a" | "b" ] .
       D = [ "c" | "e" ] .
       E = "f" | "g" .

   END S.

Rewrite the productions:

       S = A | B .               1
       A = C D | E "f" .         2
       B = "d" | "e" .           3
       C = "a" | "b" | ε         4
       D = "c" | "e" | ε         5
       E = "f" | "g" .           6

All of these offer alternatives, C and D are clearly nullable, and so A and hence also S are nullable;

     FIRST(E1) = { "f" }     FIRST(E2) = { "g" }.   Intersection is empty.
     FIRST(D1) = { "c" }     FIRST(D2) = { "e" }.   Intersection is empty.
     FIRST(C1) = { "a" }     FIRST(C2) = { "b" }.   Intersection is empty.
     FIRST(B1) = { "d" }     FIRST(B2) = { "e" }.   Intersection is empty.
     FIRST(A1) = FIRST(C D) = FIRST(C) U FIRST(D) = { "a", "b", "c", "e" }
     FIRST(A2) = FIRST(E "f" ) = FIRST(E) = { "f", "g" }.   Intersection is empty.
     FIRST(S1) = FIRST(A) = FIRST(A1) U FIRST(A2) = { "a", "b", "c", "e", "f", "g" }
     FIRST(S2) = FIRST(B) = { "d", "e" }. Intersection { "e" } - Rule broken.

For the nullable non-terminals

     FIRST(C) = { "a", "b" }    FOLLOW(C) = FIRST(D) = { "c", "e" }
     FIRST(D) = { "c", "e" }    FOLLOW(D) = Empty
     FIRST(A) = { "a", "b", "c", "e", "f", "g" } FOLLOW(A) = Empty
     FIRST(S) = { "a", "b", "c", "d", "e", "f", "g" } FOLLOW(S) = Empty

so Rule 2 is satisfied for all nullable non-terminals. (One might argue that FIRST(S) and FOLLOW(S) both contain EOF (and similarly FIRST(A) and FOLLOW(A)), but we do not have to worry about those case. EOF is not a "real" terminal, and would be "scanned" in a different way. It appears in various places in the text for clarification only.)

   COMPILER Z $TF

     PRODUCTIONS
       Z = B C | D E .
       B = [ E ] .
       C = "x" E .
       D = E { Z } ( E | "w" ) .
       E = "y" | "z" .

   END Z.

Rewrite the productions:

       Z = B C | D E .           1
       B =  E | ε .              2
       C = "x" E .               3
       D = E F G .               4
       F = ε | Z F .             5
       G = E | "w" .             6
       E = "y" | "z" .           7

Productions 2, 5, 6, 7 have alternatives that pretty obviously satisfy Rule 1. Production 1 needs more analysis.

    FIRST(Z1) = FIRST(B C) = FIRST(E) U FIRST(C) = { "y", "z", "x" }
    FIRST(Z2) = FIRST(D E) = FIRST(D) = { "y", "z" )

So rule 1 is broken.

B and F are nullable

    FIRST(B) = FIRST(E) = { "y" , "z" }
    FOLLOW(B) = FIRST(C) = { "x" }

so Rule 2 is satisfied for B

    FIRST(F) = FIRST(Z) = { "x", "y", "z" }
    FOLLOW(F) = FIRST(G) = { "w", "y", "z" }

so Rule 2 is broken for F

The following describes a silly computer language. Study it carefully and decide whether it is LL(1) compliant. If not, explain why, and make suggestions as to how it might be slightly modified in that case so that a recursive descent parser could be constructed for it.

   COMPILER Block $TF

     CHARACTERS
       letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

     TOKENS
       identifier = letter { letter } .

     IGNORE CHR(0) .. CHR(31)

     PRODUCTIONS
       Block             = "BEGIN" StatementSequence "END" .
       StatementSequence = Statement { ";" Statement } .
       Statement         = [   Block | Declaration | Assignment
                             | WhileStatement | DoStatement | IfStatement ].
       Declaration       = identifier { "," identifier } ":" Type .
       IfStatement       = "IF" Expression
                             [ "THEN" ] StatementSequence
                             [ "ELSE" StatementSequence ] .
       Expression        = identifier . /* for illustration only */
       WhileStatement    = "WHILE" Expression "DO" StatementSequence "END" .
       DoStatement       = "DO" StatementSequence "WHILE" Expression  .
       Assignment        = identifier ":=" Expression .
       Type              = "INTEGER" | "BOOLEAN" | "CHAR" .
   END Block.

A full answer will not be provided here. You are invited to work through it in detail, and to check your work by submitting the grammar to Coco/R, which will also provide lists of FIRST and FOLLOW sets if you include an F in the $Pragma string at the top of the grammar.

The grammar is non-LL(1) because

Two of the Statement alternatives both start with identifier.
ELSE is the start of a nullable string, and can also follow this - the dangling ELSE problem again (see text book).
The ; is the start and successor of a nullable section.
WHILE is the start and successor of a nullable section.

Would it perhaps have been wiser to try to write the grammar with the following alternatives for IfStatement?

       IfStatement = "IF" Expression
                     ( Statement | "THEN" Statement "ELSE" Statement" ) .

That would work, but it would be a nasty language feature, I think.

As a more interesting example of applying an analysis to a grammar expressed in EBNF, let us consider how we might describe the theatrical production of a Shakespearian play with five acts. In each act there may be several scenes, and in each scene appear one or more actors, who gesticulate and make speeches to one another (for the benefit of the audience, of course). Actors come onto the stage at the start of each scene, and come and go as the scene proceeds - to all intents and purposes between speeches - finally leaving at the end of the scene (in the Tragedies some may leave dead, but even these usually revive themselves in time to go home). Plays are usually staged with an interval between the third and fourth acts.

Actions like "speech", "entry" and "exit" are really in the category of the lexical terminals which a scanner (in the person of a member of the audience) would recognize as key symbols while watching a play.

Analyse this grammar in detail. If it proves out to be non-LL(1), try to find an equivalent that is LL(1), or argue why this should be impossible.

To check Rule 1 we look at the productions for each non-terminal that yields alternatives on its right hand side. There is seemingly only one non-terminal in this category here, namely Action. So we have to look at

      FIRST("speech")  ∩  FIRST("entry")
      FIRST("speech")  ∩  FIRST("exit")
      FIRST("speech")  ∩  FIRST("death")
      FIRST("speech")  ∩  FIRST("gesticulation")

etc. We do not have to look at FIRST(Action) ∩ anything!

Clearly Rule 1 is satisfied for any pairwise combination taken from the options in the productions for Action.

One has to be careful, very careful, about Rule 2. Rule 2 applies when there are ε productions and alternatives. It does not apply to the productions for Action, and so some people will be lulled into thinking that (since this is the only place where alternative right hand sides are explicitly given), that Rule 2 is never broken. Alas, not so. When you use EBNF and use the useful [ option ] and { repetition } constructs, there are implicit ε's that creep in. If we eliminate the metabrackets we get something like

      Play        = Act Act Act "interval" Act Act .
      Act         = Scene RestOfScene .
      Scene       = Prologue "entry" Actions .
      RestOfScene = Scene RestOfScene | ε .
      Prologue    = "speech" Prologue | ε .
      Actions     = Action Actions | ε .
      Action      = "speech" | "entry" | "exit" | "death" | "gesticulation" .

in which the following now have to be checked for violations of Rule 2

            Productions                               Check

      RestOfScene = Scene RestOfScene | ε .    (FIRST(RestOfScene) ∩ FOLLOW(RestOfScene) )
      Prologue = "speech" Prologue | ε .       (FIRST(Prologue) ∩ FOLLOW(Prologue) )
      Actions = Action Actions | ε .           (FIRST(Actions) ∩ FOLLOW(Actions)  )

Now:

    First(RestOfScene) = First(Scene) = FIRST(Prologue) ∪ { "entry" } = { "speech", "entry" } }

    Follow(RestOfScene) = FOLLOW(Act) = FIRST(Act) ∪ {"interval" }
                                      = FIRST(Scene) ∪ { "interval" } = { "speech", "entry", "interval" }

so Rule 2 is broken in this case. For the second production in this category

    FIRST(Prologue) = { "speech" }
    FOLLOW(Prologue) = { "entry" }

so Rule 2 is not broken that time, but for the third of these new productions

    FIRST(Actions) = FIRST(Action)  = { "speech", "entry", "exit", "death", "gesticulation" }
    FOLLOW(Actions) = FOLLOW(Scene) = FIRST(RestOfScene) ∪ FOLLOW(Act)
                                    = { "speech", "entry" } ∪ {"speech", "entry" "interval" }
                                    = {"speech", "entry" "interval" }

and Rule 2 is broken again. With more experience it is possible to check Rule 2 more quickly.

      Play   = Act Act Act "interval" Act Act .
      Act    = Scene { Scene } .
      Scene  = { "speech" } "entry" { Action } .
      Action = "speech" | "entry" | "exit" | "death" | "gesticulation" .

and in particular the production

Act = Scene { Scene } .

In this case Rule 2 really governs whether a parser will be able to determine whether it has to go around the loop corresponding to { Scene } for another iteration. It can only do this if FIRST(Scene) has nothing in common with what we could loosely call FOLLOW(loop), that is, FOLLOW( {Scene} ). From the third production we should quickly see that FIRST( {Scene} ) = { "speech" , "entry" }, and from the second production we should quickly see that FOLLOW( {Scene} ) = FOLLOW(Act) = { "interval" } ∪ FIRST(Act), a set which clearly includes the elements { "speech" , "entry" } again.

Even more experience or common sense might bring you to the same conclusion just from "first principles". If there is no recognizable "end of scene" marker, as a member of the audience one cannot readily tell, when a new Scene starts, whether it is the start of a new Act or not. For that matter, one cannot even tell when a new speech starts, whether there has been a scene change or not! The "language" can be made LL(1) if the theatre raises and lowers the curtain between scenes, or actually changes the furniture on the stage, although of course it is no longer strictly equivalent to the original one.

      Play   = Act Act Act "interval" Act Act .
      Act    = "dimlights" Scene { Scene } "raiselights" .
      Scene  = "raisecurtain" { "speech" } "entry" { Action } "lowercurtain" .
      Action = "speech" | "entry" | "exit" | "death" | "gesticulation" .

Similarly, there is a "quick" way to see that the production

Scene = { "speech" } "entry" { Action } .

gets you into trouble. The first loop (corresponding to { "speech" } ) gives no trouble - one more time round the loop if we hear another "speech", very different from the follower, "entry" (you hear one, you see the other, in real life). The second loop is more of a problem. One more time round the loop if we detect any element of the set of { "speech", "entry", "exit", "death", "gesticulation" }? Well, that would be nice, but after the loop is all over we must start the next Scene, and that will also start with one of { "speech", "entry" }.