Computer Science 3 - 2000 - Test on Prac 22

(This should take no longer than 20 minutes and is simply intended to probe whether you have understood the material in the prac questions. You may use your textbook if you think it will help. Answer on the questions sheet.)

1. You were expecting this one, I bet! What is your name?

2. Earlier this week we discussed how one could write grammars to describe Roman numbers. Here are two attempts at doing so:

   COMPILER RomanNumbers $CXN
   /* Describe list of Roman numbers */

   IGNORE CHR(9) .. CHR(12)
   CHARACTERS
     eol = CHR(13) .
   TOKENS
     EOL = eol.

   PRODUCTIONS
     RomanNumbers = { Number EOL } .
     Number    =   Thousands [ Hundreds [ Tens [ Ones ]]]
                 | Hundreds [ Tens [ Ones ]]
                 | Tens [ Ones ]
                 | Ones .
     Thousands = "M" { "M" } .
     Hundreds  = "C" [ "M" | "D" | "C" [ "C" ] ] | "D" [ "C" [ "C" [ "C" ] ] ] .
     Tens      = "X" [ "C" | "L" | "X" [ "X" ] ] | "L" [ "X" [ "X" [ "X" ] ] ] .
     Ones      = "I" [ "X" | "V" | "I" [ "I" ] ] | "V" [ "I" [ "I" [ "I" ] ] ] .
   END RomanNumbers.

   PRODUCTIONS
     RomanNumbers = { Number EOL } .
     Number    = [ Thousands ] [ Hundreds ] [ Tens ] [ Ones ] .
     Thousands = "M" { "M" } .
     Hundreds  = "C" [ "M" | "D" | "C" [ "C" ] ] | "D" [ "C" [ "C" [ "C" ] ] ] .
     Tens      = "X" [ "C" | "L" | "X" [ "X" ] ] | "L" [ "X" [ "X" [ "X" ] ] ] .
     Ones      = "I" [ "X" | "V" | "I" [ "I" ] ] | "V" [ "I" [ "I" [ "I" ] ] ] .
   END RomanNumbers.

Are these grammars equivalent? If not, can you give an example of a string which one would accept and the other would reject?

They are not equivalent. The first grammar will not accept strings like "MI" or "CI" as it insists on there being Tens between Thousands and Ones or between Hundreds and Ones. The second will allow numbers with no characters at all, as for this one Number is completely nullable. A better grammar, without these deficiencies, would be

   PRODUCTIONS
     RomanNumbers = { Number EOL } .
     Number    =   Thousands [ Hundreds ] [ Tens ] [ Ones ]
                 | Hundreds [ Tens ] [ Ones ]
                 | Tens [ Ones ]
                 | Ones .
     Thousands = "M" { "M" } .
     Hundreds  = "C" [ "M" | "D" | "C" [ "C" ] ] | "D" [ "C" [ "C" [ "C" ] ] ] .
     Tens      = "X" [ "C" | "L" | "X" [ "X" ] ] | "L" [ "X" [ "X" [ "X" ] ] ] .
     Ones      = "I" [ "X" | "V" | "I" [ "I" ] ] | "V" [ "I" [ "I" [ "I" ] ] ] .
   END RomanNumbers.

3. In the prac you have just done you were asked to extend a calculator grammar to allow for expressions to have parentheses and also a factorial operator. Here is an attempt to write such a grammar for a similar language where the expressions are made up of algebraic operands a, b or c.

   COMPILER Calc  $XCN
   /* Expressions with  + - * / and ! operators */

   IGNORE CHR(9) .. CHR(13)

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  |  "-" Term } .
     Term       = Factor { "*" Factor |  "/" Factor } .
     Factor     = Primary | "(" Expression ")" .
     Primary    = "a" | "b" | "c" [ "!" ] .
   END Calc.

Does this strike you as satisfactory? If not, can you give an example of a string which would be rejected when you suspect it should have been acceptable, or vice versa, and how would you rewrite the PRODUCTIONS to give a better description?

It is not satisfactory - the factorial operator can only be applied to c (as in c!) and not to say a! or b! or (a+b+c)!. This is easily fixed:

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  | "-" Term } .
     Term       = Factor { "*" Factor | "/" Factor } .
     Factor     = ( Primary | "(" Expression ")" ) [ "!" ] .
     Primary    = "a" | "b" | "c" .
   END Calc.

or, even more simply

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  | "-" Term } .
     Term       = Factor { "*" Factor | "/" Factor } .
     Factor     = ( "a" | "b" | "c" | "(" Expression ")" ) [ "!" ] .
   END Calc.

The PRODUCTIONS section here makes use of the EBNF "meta brackets" { } in the production rules. How would you rewrite your (improved) PRODUCTIONS section without making use either of the {} or [] form of meta brackets?

This was badly done, yet the transformations required are always easy. If you have a right hand side like

P = Something { Option } SomethingElse .

introduce a new non terminal RepeatedOptions

RepeatedOptions = { Option } .

and then rewrite things as

P = Something RepeatedOptions SomethingElse .
RepeatedOptions = Option RepeatedOptions | e .

Similarly, and even more simply, if you have something like

Q = Startoff [ OneOption ] Finish .

introduce a new non terminal Possible

Possible = [ OneOption ] .

and then rewrite things as

Q = Startoff Possible Finish .
Possible = OneOption | e .

with that idea, the last of the PRODUCTIONS section given above becomes

   PRODUCTIONS
     Calc        = Expressions EOF .
     Expressions = Expression Expressions | e .
     Expression  = Term MoreTerms .
     MoreTerms   = ( "+" Term  | "-" Term ) MoreTerms | e .
     Term        = Factor MoreFactors .
     MoreFactors = ( "*" Factor | "/" Factor ) MoreFactors | e .
     Factor      = ( "a" | "b" | "c" | "(" Expression ")" ) FinalOp .
     FinalOp     =  "!" | e  .
   END Calc.

4. Complete the following Cocol specification for a program that would recognize validly constructed railway trains. Railway trains are headed by one or two locomotives, and fall into two categories - passenger trains and goods trains. Goods trains consist of a series of open and closed trucks, in any mixture, followed by optional petrol tankers and then a mandatory brake van. Passenger trains consist of at least three coaches, but maybe more, though the last one of all must be a guard's coach.

If you submitted your specification to Coco/R, would you expect it to complain about LL(1) errors? Discuss briefly.

Several people did not realise that a train has a locomotive to pull it, and left the locomotives out of it altogether; others did not realise that a guard's coach is also a coach (semantically) and so can contribute to the minimum of three coaches needed to make up a passenger train. Ah well. I suppose some of you have never travelled by train?

   COMPILER Spoornet
     PRODUCTIONS
       Spoornet = { Train } EOF.
       Train = "loco" [ "loco" ] ( GoodsPart | PassengerPart ) .
       GoodsPart = { "open" | "closed" } { "petrol" } "brake" .
       PassengerPart = "coach" "coach" { "coach" } "guardCoach" .
   END Spoornet.

To do a proper LL(1) analysis easily, rewrite these as

   COMPILER Spoornet
     PRODUCTIONS
       Spoornet      = Trains EOF.
       Trains        = Train Trains | e .
       Train         = "loco" secondLoco Tail .
       secondLoco    = "loco" | e .
       Tail          = GoodsPart | PassengerPart  .
       GoodsPart     = Trucks Tankers "brake" .
       Trucks        = Truck  Trucks | e.
       Truck         = "open" | "closed" .
       Tankers       = "petrol" Tankers | e .
       PassengerPart = "coach" "coach" MoreCoaches  "guardCoach" .
       MoreCoaches   = "coach" MoreCoaches | e .
   END Spoornet.

To apply Rule 1 we look at the productions for Trains, secondLoco, Tail, Trucks, Truck, Tankers, MoreCoaches - all of which have alternatives. But it is easy to see that the alternatives in each case have disjoint sets.

To apply Rule 2 we look at the productions for Trains, secondLoco, Trucks, Tankers, MoreCoaches - all of which are nullable. But

    FIRST(Trains) = { "loco" }
    FOLLOW(Trains) = { EOF }

    FIRST(secondLoco) = { "loco" }
    FOLLOW(secondLoco) = { "open", "closed", "petrol", "coach", "brake", "guardCoach" }

    FIRST(Trucks) = { "open", "closed" }
    FOLLOW(Trucks) = { "petrol", "brake" }

    FIRST(Tankers) = { "petrol" }
    FOLLOW(Tankers) = { "brake" }

    FIRST(MoreCoaches) = { "coach" }
    FOLLOW(MoreCoaches) = { "guardCoach" }

so Rule 2 is satisfied in all cases where we have to apply it.

Computer Science 3 - 2000 - Test on Prac 22

1. You were expecting this one, I bet! What is your name?

2. In the prac you have just completed you were asked to produce a Cocol grammar to describe a set of BNF productions. Here are two attempts at doing this. Are they equivalent? If not, can you find an example of a BNF production rule that one would accept and the other would reject?

   COMPILER BNF $XCN
   /* Grammar to describe BNF productions */

   CHARACTERS
     letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     alpha  = letter + "0123456789_" .
     eol    = CHR(13) .
     space  = CHR(32) .

   IGNORE CHR(9) .. CHR(12)

   TOKENS
     EOL         = eol .
     nonterminal = "<" letter { alpha | space } ">" .
     terminal    =  letter { alpha } .

   PRODUCTIONS /* attempt 1 */
     BNF        = { Production } EOF .
     Production = nonterminal "::=" Expression EOL .
     Expression = Term { "|" Term } .
     Term       = Factor { Factor } .
     Factor     = nonterminal | terminal | "e" .
   END BNF.

   PRODUCTIONS /* attempt 2 */
      BNF        = { Production } EOF .
      Production = nonterminal "::=" Expression EOL .
      Expression = Term { "|" Term } .
      Term       = Factor { Factor } | "e" .
      Factor     = nonterminal | terminal .
   END BNF.

They are not equivalent. Attempt 1 will allow rules like

<nonterm> ::= term e term e e e .

while Attempt 2 will only allow one e to appear in an option, as in

<nonterm> ::= term | e .

It will, however, allow nonsense like

<nonterm> ::= term | e | e | e .

If one wants to restrict this it turns out to be a bit awkward, actually. There are two ways of doing it, allowing the grammar to remain LL(1) - write the productions as

   PRODUCTIONS /* attempt 3 */
      BNF        = { Production } EOF .
      Production = nonterminal "::=" Expression EOL .
      Expression = [ e "|" ] Term { "|" Term } .
      Term       = Factor { Factor } .
      Factor     = nonterminal | terminal .
   END BNF.

Alternatively, find some way to "count" the e's. We'll see how Coco/R has important "semantic extensions" in a few lectures time.

3. In the prac you have just done you were asked to extend the description of Clang to allow for "inline" assembler statements, such as

Here is a grammar that attempts to do just that.

   COMPILER Inline $XCN
   /* Grammar to describe extended Clang Inline assembler statements */

   CHARACTERS
     letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     digit    = "0123456789_" .
     notQuote = ANY - "'" - CHR(13) .

   IGNORE CHR(9) .. CHR(13)

   TOKENS
     identifier = letter { letter | digit ) .
     number     = digit { digit } .
     string     = "'" notQuote { notQuote } "'" .

   PRODUCTIONS
      Inline = "INLINE"
                 { identifier [ "+" | "-" ] [ number | string ] }
               "END" .
   END Inline.

Does it really succeed? If not, suggest how it should be improved.

This was surprisingly badly done. I had hoped that everyone would have seen that this was far too "permissive" - not all identifiers are valid opcode mnemonics, and not all opcodes can take arbitrary parameters. What I had hoped to get was something more like

   PRODUCTIONS
     Inline = "INLINE" { Code } "END" .
     Code        = OneWord | TwoWord | PrintString .
     OneWord     =   "NEG" | "ADD" | "SUB" | "MUL" | "DVD" | "ODD"
                   | "EQL" | "NEQ" | "LSS" | "GEQ" | "GTR" | "LEQ" | "STK"
                   | "STO" | "HLT" | "INN" | "PRN" | "INC" | "PRC" | "NLN"
                   | "VAL" | "NOP" .
     TwoWord     = ( "LIT" | "ADR" | "PSH" | "POP" | "DSP" | "BRN" | "BZE" )
                   [ "+" | "-" ] number .
     PrintString =   "PRS" string .
   END Inline.

This was badly done, yet the transformations required are always easy. If you have a right hand side like

P = Something { Option } SomethingElse

introduce a new non terminal RepeatedOptions

RepeatedOptions = { Option } .

and then rewrite things as

P = Something RepeatedOptions SomethingElse
RepeatedOptions = Option RepeatedOptions | e .

Similarly, and even more simply, if you have something like

Q = Startoff [ OneOption ] Finish

introduce a new non terminal Possible

Possible = [ OneOption ] .

and then rewrite things as

Q = Startoff Possible Finish
Possible = OneOption | e .

With that idea, the PRODUCTIONS section given above becomes

   PRODUCTIONS
     Inline      = "INLINE" Codes "END" .
     Codes       = Code Codes | e .
     Code        = OneWord | TwoWord | PrintString .
     OneWord     =   "NEG" | "ADD" | "SUB" | "MUL" | "DVD" | "ODD"
                   | "EQL" | "NEQ" | "LSS" | "GEQ" | "GTR" | "LEQ" | "STK"
                   | "STO" | "HLT" | "INN" | "PRN" | "INC" | "PRC" | "NLN"
                   | "VAL" | "NOP" .
     TwoWord     = ( "LIT" | "ADR" | "PSH" | "POP" | "DSP" | "BRN" | "BZE" )
                   Sign  number .
     Sign        = "+" | "-" | e .
     PrintString =   "PRS" string .
   END Inline.

4. Complete the following Cocol specification for a program that would recognize palindromes constructed only of the letters "a" and "b". As you will recall from an earlier prac, palindromes are strings that read the same from either end.

If you submitted your specification to Coco/R, would you expect it to complain about LL(1) errors? Discuss briefly.

This is one of those awful problems that looks deceptively simple, and indeed is deceptive. We need to be able to cater for palindromes of odd or even length, and we need to be able to cater for palindromes of finite length, so that the "repetition" that one immediately thinks of has to be able to terminate.

Here are some that don't work:

   COMPILER Palindrome /* does not terminate */
     PRODUCTIONS
       Palindrome = "a"  Palindrome  "a" | "b"  Palindrome  "b" .
   END Palindrome.

   COMPILER Palindrome /* only allows odd length palindromes */
     PRODUCTIONS
       Palindrome = "a"  Palindrome  "a" | "b"  Palindrome  "b" | "a" | "b" .
   END Palindrome.

   COMPILER Palindrome /* only allows even length palindromes */
     PRODUCTIONS
       Palindrome = "a" [ Palindrome ] "a" | "b" [ Palindrome ] "b" .
   END Palindrome.

Of those grammars, the first is LL(1), but is useless (it is not "reduced" in the sense of the definitions on page 103). The second one is obviously non-LL(1) as the terminals "a" and "b" can start more than one alternative. The third one is less obviously non-LL(1). If you rewrite it

   COMPILER Palindrome /* only allows even length palindromes */
     PRODUCTIONS
       Palindrome = "a" Extra "a" | "b" Extra "b" .
       Extra      = Palindrome | e .
   END Palindrome.

and note that Extra is nullable, then FIRST(Extra) = { "a", "b" } and FOLLOW(Extra) = { "a", "b" }.

Here is another attempt

   COMPILER Palindrome /* allows any length palindromes */
     PRODUCTIONS
       Palindrome = [ "a"  Palindrome  "a" | "b"  Palindrome  "b" | "a" | "b" ] .
   END Palindrome.

This describes both odd and even length palindromes, but is non-LL(1). Palindrome is nullable, and both FIRST(Palindrome) and FOLLOW(Palindrome) = { "a", "b" }.

In fact, when you think about it, you simply will not be able to find an LL(1) grammar for this language. (That is fine; grammars don't have to be LL(1) to be valid grammars. They just have to be LL(1) or very close to LL(1) to be able to write recursive descent parsers. Here's how to think about it. Suppose I asked you to hold your breath for as long as you could, and also to nod your head when you were half way through. I don't believe you could do it - you don't know before you begin exactly how long you will be holding breath. Similarly, if I told you to get into my car and drive it till the tank was empty and to hoot the hooter when you were half way to running out you could not do it.

LL(1) parsers have to be able to decide just by looking at one token exactly what to do next - if they have to guess when they are are half way through parsing they will not be able to do so. One would have to stop applying the options like Palindrome = "a" Palindrome "a" at the point where one had generated or analyzed half the palindrome, and if there is no distinctive character in the middle one would not expect the parser to be able to do so.