Practical 21 solution

Computer Science 3 - 2006

Programming Language Translation

Practical for Week 21, beginning 18 September 2006 - Solutions

You can obtain machine readable versions of these grammars from the solution kit PRAC21A.ZIP or PRAC21AC.ZIP if you would like to experiment with them further.

Task 2

Extending the calculator grammar can be done in several ways. Here is a simple one of them, which corresponds to the approach taken in languages like Pascal, which do not allow two signs to appear together:

    COMPILER Calc1 $CN
    /* Simple four function calculator - extended
       P.D. Terry, Rhodes University, 2006 */

    CHARACTERS
      digit      = "0123456789" .
      hexdigit   = digit + "ABCDEF" .

    TOKENS
      decNumber  = digit { digit } .
      hexNumber  = "$" hexdigit { hexdigit } .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Calc1      = { Expression "=" } EOF .
      Expression = ["+" | "-" ] Term { "+" Term  | "-" Term } .
      Term       = Factor { "*" Factor | "/" Factor } .
      Factor     = Primary { "!" } .
      Primary    = decNumber | hexNumber | [ "sqrt" ] "(" Expression ")" .
    END Calc1.

Another approach, similar to that taken in C++, is as follows:

    PRODUCTIONS
      Calc2      = { Expression "=" } EOF .
      Expression = Term { "+" Term  | "-" Term } .
      Term       = Factor { "*" Factor | "/" Factor } .
      Factor     = ( "+" | "-" ) Factor | Primary { "!" } .
      Primary    = decNumber | hexNumber | [ "sqrt" ] "(" Expression ")" .
    END Calc2.

This allows for expressions like 3 + - 7 or even 3 * -4 or even 3 / + - 4!. Because of the way the grammar is written, the last of these is equivalent to 3 / ( + ( - (4!))). It is clearer like this than if one tries to simplify the definition of Factor still further to

       Factor     = ( "+" | "-" ) Primary { "!" } .

in which the interpretation of -4! would be (-4)! and not -(4!) as it should be.

Here are some other suggestions. What, if any, differences are there between these and the other solutions presented so far?

    PRODUCTIONS
      Calc4      = { Expression "=" } EOF .
      Expression = Term { "+" Term  | "-" Term } .
      Term       = Factor { "*" Factor | "/" Factor } .
      Factor     = ( "+" | "-" ) Factor | Primary | "sqrt" "(" Expression ")" ) .
      Primary    = ( decNumber | hexNumber | "(" Expression ")" ) { "!" } .
    END Calc4.

    PRODUCTIONS
      Calc5      = { Expression "=" } EOF .
      Expression = ["+" | "-" ] Term { "+" Term  | "-" Term } .
      Term       = Factor { "*" Factor | "/" Factor } .
      Factor     = Primary { "!" } | "sqrt" "(" Expression ")" .
      Primary    = decNumber | hexNumber | "(" Expression ")" .
    END Calc5.

Several people suggested productions like this

      Factor     = ( "+" | "-" ) Factor | Primary | "sqrt(" Expression ")" ) .

A terminal like "sqrt(" is restrictive. It is usually better to allow white space to appears between method names and parameter lists if the user prefers thiis style.

Task 3 - Happy families

This was meant to be very straightforward and should have caused no difficulties. Here is one solution in the spirit of the exercise:

    COMPILER Family1 $CN
    /* Describe a family
       P.D. Terry, Rhodes University, 2006
       Grammar only */

    CHARACTERS
      control      = CHR(0) .. CHR(31) .
      letter       = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      digit        = "0123456789" .

    TOKENS
      name         = letter { letter | "'" letter | "-" letter } .
      number       = digit { digit } .

    IGNORE control

    PRODUCTIONS
      Family1      = { Section } .
      Section      = Surname | Parents | Grandparents | Children | Appendage .
      Surname      = "Family" ":" name { name } .
      Parents      = "Parents" ":" NameList .
      Grandparents = "Grandparents" ":" NameList .
      Children     = "Children" ":" NameList .
      NameList     = OnePerson { "," OnePerson } .
      OnePerson    = name { name } [ "(" "deceased" ")" ] .
      Appendage    = number (   "cat"   | "cats"   | "dog" | "dogs"
                              | [ "small" ] ( "house" | "houses") | "car" | "cars" ) .
    END Family1.

That solution does not insist that the surname should be part of all descriptions. Here is an alternative PRODUCTIONS set that does just that, and also factorizes the grammar slightly differently:

   PRODUCTIONS
     Family2      = { Generation } Surname { Generation } { Appendage } .
     Surname      = "Family" ":" name { name } .
     Generation   = ( "Parents" | "Grandparents" | "Children" ) ":" NameList .
     NameList     = OnePerson { "," OnePerson } .
     OnePerson    = name { name } [ "(" "deceased" ")" ] .
     Appendage    = number [ "small" ] ( "cat"   | "cats"   | "dog" | "dogs" |
                                         "house" | "houses" | "car" | "cars" ) .
   END Family2.

Three more points are worth making (a) the Surname section should not have allowed the possibility of listing the name as deceased (b) it is better to use a construct like "(" "deceased" ")" than "(deceased)" as a single terminal (c) we could have used the terminal name instead of listing the specific possessions in the family.

Task 4 - One for the Musicians in our Midst (but the rest of you should do it too)

This is straightforward, but note the way in which an eol singleton character set is introduced from which the single character EOL token is defined - this is a rather unusual case (in most languages end-of-line is insignificant). Note also that a line of words might also contain some solfa key words as ordinary words - for example "so". Note how the token word has been defined - multiple - and ' characters are allowed, but at most one trailing punctuation mark. We probably would not want to cater for sequesnces like Tom!!,Dick,Harry as making up one word.

    COMPILER Solfa $CN
    /* Describe the words and notes of a tune using tonic solfa
       P.D. Terry, Rhodes University, 2006
       Grammar only */

    CHARACTERS
      eol        = CHR(10) .
      control    = CHR(0) .. CHR(31) .
      letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

    TOKENS
      word       = letter { letter | "'" | "-" letter } [ "." | "," | "!" | "?" ] .
      EOL        = eol .

    IGNORE control - eol

    PRODUCTIONS
      Solfa      = { Line } .
      Line       = Words EOL Tune EOL EOL { EOL } .
      Words      = ( word | Note ) { word | Note } .
      Tune       = Note { Note } .
      Note       = "do" | "re" | "mi" | "fa" | "so" | "la" | "ti" .
    END Solfa.

Task 5 - So what if Parva is so restrictive - fix it!

The Parva extensions produced some interesting submissions. Many of them (understandably!) were too restrictive in certain respects, while others were too permissive. Here is a suggested solution:

    COMPILER Parva $CN
    /* Parva level 1.5 grammar - Prac 21 extensions
       P.D. Terry, Rhodes University, 2006
       Grammar only */

    CHARACTERS
      lf         = CHR(10) .
      backslash  = CHR(92) .
      control    = CHR(0) .. CHR(31) .
      letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      digit      = "0123456789" .
      stringCh   = ANY - '"' - control - backslash .
      charCh     = ANY - "'" - control - backslash .
      printable  = ANY - control .

    TOKENS

/* Insisting that identifiers cannot end with an underscore is quite easy */

      identifier = letter { letter | digit | "_" { "_" } ( letter | digit ) } .

/* but a simpler version is what most people thought of

      identifier = letter { letter | digit | "_" ( letter | digit ) } .

*/

      number     = digit { digit } .
      stringLit  = '"' { stringCh | backslash printable } '"' .
      charLit    = "'" ( charCh   | backslash printable ) "'" .

    COMMENTS FROM "//" TO lf
    COMMENTS FROM "/*" TO "*/"

    IGNORE CHR(9) .. CHR(13)

    PRODUCTIONS
      Parva             = "void" identifier "(" ")" Block .

      Block             = "{" { Statement } "}" .

/* We need some more nonterminals for the new statement forms */

      Statement         =   Block | ConstDeclarations | VarDeclarations
                          | Assignment | IncOrDecStatement
                          | IfStatement | WhileStatement | RepeatStatement | ForStatement
                          | ReturnStatement | HaltStatement
                          | ReadStatement | WriteStatement
                          | ";" .

/* Declarations remain the same as before */

      ConstDeclarations = "const" OneConst { "," OneConst } ";" .
      OneConst          = identifier "=" Constant .
      Constant          = number | charLit | "true" | "false" | "null" .
      VarDeclarations   = Type OneVar { "," OneVar } ";" .
      OneVar            = identifier [ "=" Expression ] .

/* To deal with statements like i = 6; i, j = 4, k; and i++; we need to manipulate the
   production for Assignment if we want to preserve an LL(1) grammar */

      Assignment        = Designator
                          (   { "," Designator } "=" Expression { "," Expression }
                            | "++"
                            | "--"
                          ) ";" .

/* Prefix increment/decrement statements like ++i; or ++list[7]; cause no LL(1) problems */

      IncOrDecStatement = ( "++" | "--" ) Designator ; .

/* In all these it is useful to maintain generality by using Designator, not identifier */

      Designator        = identifier [ "[" Expression "]" ] .

/* The if-then-elsif-else construction is most easily described as follows. Although
   this is not LL(1), this works admirably - it is simply the well-known dangling
   else ambiguity, which the parser resolves by associating elsif and else clauses
   with the most recent if */

      IfStatement       = "if" "(" Condition ")" Statement
                          { "elsif" "(" Condition ")" Statement }
                          [ "else" Statement ] .

/* The Pascal-like "repeat" statement is almost trivial.  Note that we can make use of
   several Statements between "repeat" and "until"  (different in Java!) */

      RepeatStatement   = "repeat" { Statement } "until" "(" Condition ")" ";" .

/* The for statement is straightforward, but introduces the concept of a list of
   expressions which is also used in the production for Expression itself */

      ForStatement      = "for" Designator "in" ExpList Statement .
      ExpList           = "(" Range { "," Range } ")" .
      Range             = Expression [ ".." Expression ] .

/* Most of the rest of the grammar remains unchanged: */

      WhileStatement    = "while" "(" Condition ")" Statement .
      ReturnStatement   = "return" ";" .
      HaltStatement     = "halt" ";" .
      ReadStatement     = "read" "(" ReadElement { "," ReadElement } ")" ";" .
      ReadElement       = stringLit | Designator .
      WriteStatement    = "write" "(" WriteElement { "," WriteElement } ")" ";" .
      WriteElement      = stringLit | Expression .
      Condition         = Expression .

/* The basic form of Expression introduces "in", effectively as another relational operator
   with the same precedence as the other relational operators */

      Expression        = AddExp [ RelOp AddExp | "in" ExpList ] .
      AddExp            = [ "+" | "-" ] Term { AddOp Term } .
      Term              = Factor { MulOp Factor } .
      Factor            =   Designator | Constant
                          | "new" BasicType "[" Expression "]"
                          | "!" Factor | "(" Expression ")" .
      Type              = BasicType [ "[]" ] .
      BasicType         = "int" | "bool" .
      AddOp             = "+" | "-" | "||" .

/* The % operator has the same precedence as other multiplicative operators */

      MulOp             = "*" | "/" | "%" | "&&" .
      RelOp             = "==" | "!=" | "<" | "<=" | ">" | ">=" .
    END Parva.