Computer Science 3 - 2001 - Test on Prac 17


THURSDAY TEST

1.  No, I still don't know:  What is your surname?

2.  Consider the following simple Cocol grammar that attempts to describe the
    Roman Number representations of the numbers 1 through 9, that is I, II,
    III, IV, V, VI, VII, VIII, IX.

       COMPILER Roman

       PRODUCTIONS
         Roman       = OneToThree | FourOrNine | FiveToEight .
         OneToThree  = "I" [ "I" ] [ "I" ] .
         FourOrNine  = "I" ( "V" | "X" ) .
         FiveToEight = "V" [ OneToThree ].
       END Roman.

    This grammar is correct in one sense, but suffers from several flaws.  How
    many can you find, and can you write a better one?  (Keep it as simple as
    possible; no need to add comments and ignorable sections; concentrate on
    the PRODUCTIONS).

It is badly non-LL(1), and is ambiguous.  A string like II could be derived
from OneToThree by selecting the first and second I's, or the first and third
I's. And the OneToThree and FourToNine alternatives both have FIRST sets
consisting of a single "I".

There are numerous ways of correcting it.  The brute force one would be to say

    Roman = "I" | "II" | "III" | "IV" | "V" | "VI" | "VII" | "VIII" | "IX" | "X" .

and if you suggested that you got credit!  Entering more into the spirit of
things might lead to productions like

   PRODUCTIONS
     Roman                   = OneToThreeOrFourOrNine | FiveToEight .
     ZeroToTwo               = [ "I" [ "I" ]] .
     OneToThreeOrFourOrNine  = "I" [ ZeroToTwo | "V" | "X"  ] .
     FiveToEight             = "V" [ "I" ZeroToTwo ] .
   END Roman.

3.  Consider the attached, hopefully familiar Clang grammar.

    Suppose that you wanted to extend the Clang language so that you could
    allow real numeric literals like 123.45 or .45 or 45. as well as the whole
    number ones (like 1234).  What changes would you have to make to the Cocol
    specification?

The problem here is to make sure that all the valid combinations are allowed,
unambiguously, insisting that at least one digit appear, and with no
possibility of obtaining several periods in one number.  So attempts like the
following are wrong

   TOKENS
     number = digit { digit } [ "." ] { digit }  .         (* ambiguous *)
     number = [ "." ] digit { digit } [ "." { digit } ]  . (* can get 2 "." *)
     number = {digit } [ "." {digit } ] .                  (* nullable *)

Here is a better one:

     number =   digit { digit } [ "."  { digit } ]
              | "." digit { digit }  .

4.  Suppose you wanted to allow Clang to have both one and two dimensional
    arrays, so that you could write code like

        VAR
          Matrix[10, 10], Vector[10], I;

        BEGIN
          I := 1;
          WHILE I <= 10 DO BEGIN
            Vector[I] := Matrix[I, 10] * Matrix[I+4, I-3];
            I := I + 1
          END;

    What changes would have to be made to the Cocol specification?

This was so easy that it is a pity that so many people missed it:

     UpperBound = "[" number [ "," number ]  "]" .
     Designator = identifier [ "[" Expression [ "," Expression ] "]" ] .

5.  A very quick and dirty solution to the stack assembler grammar was picked
    out of the rubbish bin early this morning.  The PRODUCTIONS section read

      PRODUCTIONS
        Program = { Statement } .
        Statement = [ number ] Opcode [ number | string ] EOL .
        Opcode = "ADR" | "LIT" | "BZE" | (* all the rest were there ... *)
                 "STO" | "INN" | "PRS" .
      END Program.

    What critical message would the marker have scribbled on this, if it had
    been handed in?  (Assume that the TOKENS, COMMENTS, CHARACTERS and
    IGNORABLE sections are correct - focus your attention simply on the
    PRODUCTIONS as given here).

Clearly this one, at least, had been encountered in the practical itself.
There are two main problems (a) no distinction is drawn between the various
classes of opcodes, and this is so easily specified that it should have been
done (b) the grammar does not allow completely empty statements (with only the
EOL at the end).  Here is one better solution; there are plenty more like it.

      PRODUCTIONS
        Program   = { [ Statement ] EOL } .
        Statement = [ number ] Operation .
        Operation = Opcode1 | Opcode2 number | Opcode3 string .
        Opcode1   = "STO" | "INN" | "VAL" (* all the ones like this *) .
        Opcode2   = "ADR" | "LIT" | "BZE" (* all the ones like that *) .
        Opcode3   = "PRS" .
      END Program.

FRIDAY TEST

1   No, I still don't know:  What is your surname?

2.  Consider the following simple Cocol grammar that attempts to describe a
    medley of bagpipe tunes on the lines of those you have enjoyed listening
    to all week (you will remember that by tradition a strathspey section is
    always followed by at least one reel).

      COMPILER Medley

      PRODUCTIONS
        Medley = { Tune } "silence" .
        Tune = "march" | "jig" | "hornpipe" | "slowMarch" | "reel" | StrathspeyReel .
        StrathspeyReel = "strathspey" { "strathspey" } "reel" { "reel" } .

      END Medley.

    This grammar is correct in one sense, and incorrect in another.  What is
    wrong with it, and could you write a better one?  (Don't get side tracked
    into discussing the other kinds of competition in the original problem; we
    simply want to describe the Medley event here.)

The grammar is non-LL(1), and ambiguous.  The problem lies with the nullable
{ "reel" } clause.  The better grammar would simply have

        StrathspeyReel = "strathspey" { "strathspey" } "reel" .

and, possibly

        Medley = Tune { Tune } "silence" .

on the grounds that if there are no tunes at all you don't really have
anything to listen to at all!

3.  Consider the attached, hopefully familiar Clang grammar. 

    Suppose that you wanted to extend the Clang language so that you could
    allow hexadecimal numeric literals like  0FFH and binary literals like
    001010B.  What changes would you have to make to the Cocol specification?

This can be done most efficiently by writing

  CHARACTERS
    binDigit = "01" .
    digit    = "0123456789" .
    hexDigit = digit + "ABCDEFabcdef" .

  TOKENS
    number = digit { digit } | digit { hexDigit } "H" | binDigit {binDigit } "B".

Note that the following (submitted in some form by various students) is wrong

    number = digit { digit } | hexDigit { hexDigit } "H" | binDigit {binDigit } "B".

as this would allow, for example  BACH,  which would be a valid identifier as
well as an apparently valid hexadecimal number.  Numbers must start with
digits, and the digits must remain in range.  And, in spite of various
attempts to persuade me otherwise, I remain unimpressed by this sort of thing:

  CHARACTERS
    binDigit = "01B" .
    digit    = "0123456789" .
    hexDigit = digit + "ABCDEFHabcdefH" .

  TOKENS
    number = digit { digit } | digit { hexDigit } | binDigit { binDigit }.

which would have allowed "numbers" like 99HHHHHH123H or B0B0B0B.

4.  Some languages allow one to have variations on the IfStatement that allow
    one to write code like

         if A > B
           then C := D
           elsif A > E then Write('A is bigger than E')
           elsif A > G then while A > G do A := A -1
           else  Read (* new value of *) (A)

    that is, where one has a number of optional "elsif" clauses and an
    optional final "else" clause.  How would you add this statement type to
    the Cocol specification of Clang?

This was so easy that it was alarming to see how many people got it wrong!

    IfStatement       = "IF" Condition "THEN" Statement
                        { "ELSIF" Condition "THEN" Statement }
                        [ "ELSE" Statement ] .

5.  A Keen Type has decided to extend Clang to allow programmers to
    incorporate exponentiation into their expressions, for example, writing
    x^2 + y^3 to mean x2 + y3.  She has had two Bright Ideas.  One is to
    change the productions to

     Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } .
     Term       = Factor { MulOp Factor } .
     Factor     = Primary [ "^" Expression ] .   /* ---------- addition */
     Primary    = Designator | number | "(" Expression ")" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

    and the other is to make the change

     Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } .
     Term       = Factor { MulOp Factor } .
     Factor     = Primary [ "^" Factor ] .       /* ---------- addition */
     Primary    = Designator | number | "(" Expression ")" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

    Explain to her which is the better choice, giving reasons.

Very few people saw the subtlety here.  The first one suffers from the problem
that in a string like the one suggested (that's why I suggested it, as a hint!)

            x^2 + y^3

the operators would seem to bind in such a way that it would "mean"

            x^(2 + y^(3))

which is probably not what one would like.  In fact it is worse than that: the
grammar is non-LL(1) (which is not in itself necessarily a disaster), and it
is ambiguous (which is pretty awful - if you don't believe me, try drawing
parse trees for 4^5*6).

If you want to write the equivalent of x^(2+y^3), then the second grammar
still allows you to do that, since a primary can be a parenthesized
expression; the second one would put the natural interpretation onto x^2 + y^2
as well.


REFERENCE - ORIGINAL CLANG GRAMMAR COMPILER Clang $XCN /* generate compiler, C++ */ /* Simple grammar for Clang level 1 P.D. Terry, Rhodes University, 1998 */ IGNORE CASE IGNORE CHR(9) .. CHR(13) COMMENTS FROM "(*" TO "*)" CHARACTERS cr = CHR(13) . lf = CHR(10) . letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . digit = "0123456789" . instring = ANY - "'" - cr - lf . TOKENS identifier = letter { letter | digit } . number = digit { digit } . string = "'" (instring | "''") { instring | "''" } "'" . PRODUCTIONS Clang = "PROGRAM" identifier ";" Block "." . Block = { ConstDeclarations | VarDeclarations } CompoundStatement . ConstDeclarations = "CONST" OneConst { OneConst } . OneConst = identifier "=" number ";" . VarDeclarations = "VAR" OneVar { "," OneVar } ";" . OneVar = identifier [ UpperBound ] . UpperBound = "[" number "]" . CompoundStatement = "BEGIN" Statement { ";" Statement } "END" . Statement = [ CompoundStatement | Assignment | IfStatement | WhileStatement | ReadStatement | WriteStatement ] . Assignment = Variable ":=" Expression . Variable = Designator . Designator = identifier [ "[" Expression "]" ] . IfStatement = "IF" Condition "THEN" Statement . WhileStatement = "WHILE" Condition "DO" Statement . Condition = Expression RelOp Expression . ReadStatement = "READ" "(" Variable { "," Variable } ")" . WriteStatement = "WRITE" [ "(" WriteElement { "," WriteElement } ")" ] . WriteElement = string | Expression . Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Designator | number | "(" Expression ")" . AddOp = "+" | "-" . MulOp = "*" | "/" . RelOp = "=" | "<>" | "<" | "<=" | ">" | ">=" . END Clang.