Computer Science 301 - 2008

Tutorial for week 22 - Practice in checking LL(1) compliance

You can find the source for these grammars on the WWW page in the file tut22.zip.

Are the following grammars LL(1) compliant? If not, why not?

    COMPILER A $TF

      PRODUCTIONS
        A = B ( "x" | "z" ) ( "w" | "z" ) B .
        B = "x" B "z" | { "y" } .

    END A.

The best way to solve these problems may be to eliminate the meta-symbols. For example:

        A = B C D B .            1
        C = "x" | "z" .          2
        D = "w" | "z" .          3
        B = "x" B "z" | E .      4
        E = e | "y" E .          5

Of these productions 2 - 5 offer alternatives and so should be checked for transgressions of "Rule 1". 4 and 5 show that the non-terminals B and E are nullable, and so these non-terminals should be checked for transgressions of "Rule 2".

Rules 2 - 5 come through the Rule 1 test with flying colours:

     FIRST(C1) = { "x" }     FIRST(C2) = { "z" }.   Intersection is empty.
     FIRST(D1) = { "w" }     FIRST(D2) = { "z" }.   Intersection is empty.
     FIRST(B1) = { "x" }     FIRST(B2) = { "y" }.   Intersection is empty.
     FIRST(E1) =  Empty      FIRST(E2) = { "y" }.   Intersection is empty.

For the nullable non-terminals

     FIRST(B) = { "x", "y" }    FOLLOW(B) = FIRST(C) U { "z" } = { "x", "z" }

so Rule 2 is broken for B.

     FIRST(E) = { "y" }         FOLLOW(E) = FOLLOW(B) = { "x", "z" }

so Rule 2 is satisfied for E.


   COMPILER S $FT

     PRODUCTIONS
       S = A | B .
       A = C D | E "f" .
       B = "d" | "e" .
       C = [ "a" | "b" ] .
       D = [ "c" | "e" ] .
       E = "f" | "g" .

   END S.

Rewrite the productions:

       S = A | B .               1
       A = C D | E "f" .         2
       B = "d" | "e" .           3
       C = "a" | "b" | e         4
       D = "c" | "e" | e         5
       E = "f" | "g" .           6

All of these offer alternatives, C and D are clearly nullable, and so A and hence also S are nullable;

     FIRST(E1) = { "f" }     FIRST(E2) = { "g" }.   Intersection is empty.
     FIRST(D1) = { "c" }     FIRST(D2) = { "e" }.   Intersection is empty.
     FIRST(C1) = { "a" }     FIRST(C2) = { "b" }.   Intersection is empty.
     FIRST(B1) = { "d" }     FIRST(B2) = { "e" }.   Intersection is empty.
     FIRST(A1) = FIRST(C D) = FIRST(C) U FIRST(D) = { "a", "b", "c", "e" }
     FIRST(A2) = FIRST(E "f" ) = FIRST(E) = { "f", "g" }.   Intersection is empty.
     FIRST(S1) = FIRST(A) = FIRST(A1) U FIRST(A2) = { "a", "b", "c", "e", "f", "g" }
     FIRST(S2) = FIRST(B) = { "d", "e" }. Intersection { "e" } - Rule broken.

For the nullable non-terminals

     FIRST(C) = { "a", "b" }    FOLLOW(C) = FIRST(D) = { "c", "e" }
     FIRST(D) = { "c", "e" }    FOLLOW(D) = Empty
     FIRST(A) = { "a", "b", "c", "e", "f", "g" } FOLLOW(A) = Empty
     FIRST(S) = { "a", "b", "c", "d", "e", "f", "g" } FOLLOW(S) = Empty

so Rule 2 is satisfied for all nullable non-terminals. (One might argue that FIRST(S) and FOLLOW(S) both contain EOF (and similarly FIRST(A) and FOLLOW(A)), but we do not have to worry about those case. EOF is not a "real" terminal, and would be "scanned" in a different way. It appears in various places in the text for clarification only.)


   COMPILER Z $TF

     PRODUCTIONS
       Z = B C | D E .
       B = [ E ] .
       C = "x" E .
       D = E { Z } ( E | "w" ) .
       E = "y" | "z" .

   END Z.

Rewrite the productions:

       Z = B C | D E .           1
       B =  E | e .              2
       C = "x" E .               3
       D = E F G .               4
       F = e | Z F .             5
       G = E | "w" .             6
       E = "y" | "z" .           7

Productions 2, 5, 6, 7 have alternatives that pretty obviously satisfy Rule 1. Production 1 needs more analysis.

    FIRST(Z1) = FIRST(B C) = FIRST(E) U FIRST(C) = { "y", "z", "x" }
    FIRST(Z2) = FIRST(D E) = FIRST(D) = { "y", "z" )

So rule 1 is broken.

B and F are nullable

    FIRST(B) = FIRST(E) = { "y" , "z" }
    FOLLOW(B) = FIRST(C) = { "x" }

so Rule 2 is satisfied for B

    FIRST(F) = FIRST(Z) = { "x", "y", "z" }
    FOLLOW(F) = FIRST(G) = { "w", "y", "z" }
so Rule 2 is broken for F


The following describes a silly computer language. Study it carefully and decide whether it is LL(1) compliant. If not, explain why, and make suggestions as to how it might be slightly modified in that case so that a recursive descent parser could be constructed for it.

   COMPILER Block $TF

     CHARACTERS
       letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

     TOKENS
       identifier = letter { letter } .

     IGNORE CHR(0) .. CHR(31)

     PRODUCTIONS
       Block             = "BEGIN" StatementSequence "END" .
       StatementSequence = Statement { ";" Statement } .
       Statement         = [   Block | Declaration | Assignment
                             | WhileStatement | DoStatement | IfStatement ].
       Declaration       = identifier { "," identifier } ":" Type .
       IfStatement       = "IF" Expression
                             [ "THEN" ] StatementSequence
                             [ "ELSE" StatementSequence ] .
       Expression        = identifier . /* for illustration only */
       WhileStatement    = "WHILE" Expression "DO" StatementSequence "END" .
       DoStatement       = "DO" StatementSequence "WHILE" Expression  .
       Assignment        = identifier ":=" Expression .
       Type              = "INTEGER" | "BOOLEAN" | "CHAR" .
   END Block.

A full answer will not be provided here. You are invited to work through it in detail, and to check your work by submitting the grammar to Coco/R, which will also provide lists of FIRST and FOLLOW sets if you include an F in the $Pragma string at the top of the grammar.

The grammar is non-LL(1) because

Would it perhaps have been wiser to try to write the grammar with the following alternatives for IfStatement?

       IfStatement = "IF" Expression
                     ( Statement | "THEN" Statement "ELSE" Statement" ) .

That would work, but it would be a nasty language feature, I think.