Computer Science 3 - 2003

Programming Language Translation


Practical for Week 21, beginning 15 September 2003 - Solutions

Complete sources to these solutions can be found on the course WWW pages in the file PRAC21A.ZIP.


Task 2 - Extensions to the Simple Calculator

Exponentiation is a stronger operation than multiplication, so it has to be introduced deeper into the hierarchy. Functions like sqrt also take precedence over other operations:

   COMPILER Calc  $CN
   /* Simple four function calculator - extended
      P.D. Terry, Rhodes University, 2003 */

   CHARACTERS
     digit      = "0123456789" .
     hexdigit   = digit + "ABCDEF" .

   IGNORE CHR(1) .. CHR(31)

   TOKENS
     decNumber  = digit { digit } .
     hexNumber  = "$" hexdigit { hexdigit } .

   PRODUCTIONS
     Calc       = { Expression "=" } EOF .
     Expression = Term { "+" Term  |  "-" Term } .
     Term       = Factor { "*" Factor |  "/" Factor } .
     Factor     = Primary [ "^" Factor ] .
     Primary    =   decNumber | hexNumber
                  | "(" Expression ")"
                  | "sqrt" "(" Expression ")" .
   END Calc.


Task 3 - Highland Gatherings

The solutions received were mixed. Some didn't capture the idea that there should be N competitions each with M bands. A typical over simplified attempt looks like this

     Gath1         = { Competition } .
     Competition   = SlowQuick |  MSR |  Medley .
     SlowQuick     = "SlowMarch" "March"  .
     MSR           = "March" "Strathspey" "Reel" [ "March" ] .
     Medley        = OneTune { OneTune } .
     OneTune       =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                     | "Strathspey" { "Strathspey" } "Reel" .

If we want to introduce the idea that there are multiple competitions with multiple bands, then surely it makes sense to try

   COMPILER Gath2 $CN
   /* Describes the Pipe Band events at a Highland Gathering
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CHR(1) .. CHR(31)

   PRODUCTIONS
     Gath2       = { Competition } .
     Competition = Band { Band } .
     Band        = SlowQuick | MSR | Medley .
     SlowQuick   = "SlowMarch" "March" .
     MSR         = "March" "Strathspey" "Reel" [ "March" ] .
     Medley      = OneTune { OneTune } .
     OneTune     =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                   | "Strathspey" { "Strathspey" } "Reel" .

   END Gath2.

This is badly non LL(1) (It represents a continual wail of noise). We need to get some sort of break between bands at least:

   COMPILER Gath3 $CN
   /* Describes the Pipe Band events at a Highland Gathering
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CHR(1) .. CHR(31)

   PRODUCTIONS

     Gath3       = { "AnnounceCompetition" Competition } .
     Competition = Band { Band } .
     Band        = "AnnounceBand" ( SlowQuick | MSR | Medley ).
     SlowQuick   = "SlowMarch" "March" "break" .
     MSR         = "March" "Strathspey" "Reel" [ "March" ] "break" .
     Medley      = OneTune { OneTune } "break" .
     OneTune     =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                   | "Strathspey" { "Strathspey" } "Reel" .

   END Gath3.

Even this is non-LL(1). We can get an LL(1) grammar if we make the right kind of announcements:

   COMPILER Gath4 $CN
   /* Describes the Pipe Band events at a Highland Gathering
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CHR(1) .. CHR(31)

   PRODUCTIONS

     Gath4       = { "AnnounceCompetition" Competition } .
     Competition = Band { Band } .
     Band        =   "AnnounceSlow"  SlowQuick
                   | "AnnounceMSR" MSR
                   | "AnnounceMedley" Medley .
     SlowQuick   = "SlowMarch" "March" "break" .
     MSR         = "March" "Strathspey" "Reel" [ "March" ] "break" .
     Medley      = OneTune { OneTune } "break" .
     OneTune     =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                   | "Strathspey" { "Strathspey" } "Reel" .

   END Gath4.

Here is yet another solution:

   COMPILER Gath5 $CN
   /* Describes the Pipe Band events at a Highland Gathering
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CHR(1) .. CHR(31)

   PRODUCTIONS

     Gath5         = { Competition } .
     Competition   =   "AnnounceSlow"   SlowQuickComp
                     | "AnnounceMSR"    MSRComp
                     | "AnnounceMedley" MedleyComp .
     SlowQuickComp = SlowQuick { SlowQuick } .
     SlowQuick     = "SlowMarch" "March" "break" .
     MSRComp       = MSR { MSR } .
     MSR           = "March" "Strathspey" "Reel" [ "March" ] "break" .
     MedleyComp    = Medley { Medley } .
     Medley        = OneTune { OneTune } "break" .
     OneTune       =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                     | "Strathspey" { "Strathspey" } "Reel" .

   END Gath5.

In all these notice that we have not fallen into the trap of defining:

     OneTune     =   "March" | "SlowMarch" | "Jig" | "HornPipe" | "Reel"
                   | "Strathspey" { "Strathspey" } "Reel" { "Reel" } .

which is non-LL(1) again, or of trying to write

     OneTune     =   "March" | "SlowMarch" | "Jig" | "HornPipe" |
                   | "Strathspey" { "Strathspey" } "Reel" { "Reel" } .

which is LL(1), but precludes reels being played without a strathspey immediately before them.


Task 4 - Alternative description of EBNF

As hinted in the prac sheet, the trick here is to rework the productions that use meta braces for repetition by ones that use right recursion (to avoid LL(1) errors. Here is one possibility:

   COMPILER EBNF $CN
   /* Parse a set of EBNF productions
      P.D. Terry, Rhodes University, 2003 */

   CHARACTERS
     letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     lowline  = "_" .
     control  = CHR(1) .. CHR(31) .
     digit    = "0123456789" .
     noquote1 = ANY - "'" - control .
     noquote2 = ANY - '"' - control .

   IGNORE  CHR(9) .. CHR(13)

   COMMENTS FROM "(*" TO "*)"  NESTED

   TOKENS
     nonterminal = letter { letter | lowline | digit } .
     terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

   PRODUCTIONS
     EBNF        = Productions EOF .
     Productions = Production Productions | .
     Production  = nonterminal "=" Expression "." .
     Expression  = Term MoreTerms .
     MoreTerms   = "|" Term MoreTerms | .
     Term        = Factor MoreFactors | .
     MoreFactors = Factor MoreFactors | .
     Factor      =   nonterminal
                   | terminal
                   | "[" Expression "]"
                   | "(" Expression ")"
                   | "{" Expression "}" .
   END EBNF.

The productions can actually be expressed more simply:

   COMPILER EBNF $CN
   /* Parse a set of EBNF productions
      P.D. Terry, Rhodes University, 2003 */

   CHARACTERS
     letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     lowline  = "_" .
     control  = CHR(1) .. CHR(31) .
     digit    = "0123456789" .
     noquote1 = ANY - "'" - control .
     noquote2 = ANY - '"' - control .

   IGNORE  CHR(9) .. CHR(13)

   COMMENTS FROM "(*" TO "*)"  NESTED

   TOKENS
     nonterminal = letter { letter | lowline | digit } .
     terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

   PRODUCTIONS
     EBNF        = Productions EOF .
     Productions = Production Productions | .
     Production  = nonterminal "=" Expression "." .
     Expression  = Term MoreTerms .
     MoreTerms   = "|" Term MoreTerms | .
     Term        = Factor Term | .
     Factor      =   nonterminal
                   | terminal
                   | "[" Expression "]"
                   | "(" Expression ")"
                   | "{" Expression "}" .
   END EBNF.


Task 5 - Extensions to Parva

These were very simple really. We cannot avoid the LL(1) warning for the else option in the IfStatement. This is the well-known "dangling else" problem that is a feature of many real languages. Fortunately - unlike some other LL(1) violations - it causes no harm.

   COMPILER Parva $CN
   /* Parva level 1 grammar  - Extended for prac 21
      P.D. Terry, Rhodes University, 2003
      Grammar only */

   CHARACTERS
     lf         = CHR(10) .
     backslash  = CHR(92) .
     control    = CHR(0) .. CHR(31) .
     letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     digit      = "0123456789" .
     stringCh   = ANY - '"' - control - backslash .
     charCh     = ANY - "'" - control - backslash .
     printable  = ANY - control .

   TOKENS
     identifier = letter { letter | digit | "_" ( letter | digit ) } .
     number     = digit { digit } .
     stringLit  = '"' { stringCh | backslash printable } '"' .
     charLit    = "'" ( charCh   | backslash printable ) "'" .

   IGNORE CHR(9) .. CHR(13)
   COMMENTS FROM "//" TO lf
   COMMENTS FROM "/*" TO "*/"

   PRODUCTIONS
     Parva             = "void" identifier "(" ")" Block .
     Block             = "{" { Statement } "}" .
     Statement         = { Label }
                         (   Block | ConstDeclarations | VarDeclarations | Assignment
                           | IfStatement | WhileStatement | ReturnStatement | HaltStatement
                           | DoWhileStatement | ForStatement | GoToStatement
                           | ReadStatement | WriteStatement
                           | ";"
                         ) .
     Label             = number ":" .
     ConstDeclarations = "const" OneConst { "," OneConst } ";" .
     OneConst          = identifier "=" Constant .
     Constant          = number | charLit | "true" | "false" | "null" .
     VarDeclarations   = Type OneVar { "," OneVar } ";" .
     OneVar            = identifier [ "=" Expression ] .
     Assignment        = Designator "=" Expression ";" .
     Designator        = identifier [ "[" Expression "]" ] .
     IfStatement       = "if" "(" Condition ")" Statement [ "else" Statement ] .
     WhileStatement    = "while" "(" Condition ")" Statement .
     ReturnStatement   = "return" ";" .
     HaltStatement     = "halt" ";" .
     DoWhileStatement  = "do" Statement "while" "(" Condition ")" ";" .
     ForStatement      = "for" "(" [ BasicType ] identifier "=" Expression ";"
                         Condition ";" identifier "=" Expression ")" Statement .
     GoToStatement     = "goto" number .
     ReadStatement     = "read" "(" ReadElement { "," ReadElement } ")" ";" .
     ReadElement       = stringLit | Designator .
     WriteStatement    = "write" "(" WriteElement { "," WriteElement } ")" ";" .
     WriteElement      = stringLit | Expression .
     Condition         = Expression .
     Expression        = AndExp { "||" AndExp } .
     AndExp            = EqlExp { "&&" EqlExp } .
     EqlExp            = RelExp { EqlOp RelExp } .
     RelExp            = AddExp [ RelOp AddExp ] .
     AddExp            = MulExp { AddOp MulExp } .
     MulExp            = Factor { MulOp Factor } .
     Factor            = Primary | ( "+" | "-" | "!" ) Factor .
     Primary           =   Designator | Constant
                         | "new" BasicType "[" Expression "]"
                         | "(" Expression ")" .
     Type              = BasicType [ "[]" ] .
     BasicType         = "int" | "bool" .
     AddOp             = "+" | "-" .
     MulOp             = "*" | "/" | "%" .
     EqlOp             = "==" | "!=" .
     RelOp             = "<" | "<=" | ">" | ">=" .
   END Parva.


Task 6 - Spoornet are looking for programmers

The wheels came off in many solutions. It is quite hard to get right, and one cannot easily find an LL(1) grammar that really matches the problem as set. Here is a simple first attempt, but with no safety regulations.

   COMPILER Train1 $CN
   /* Grammar for simple railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR(1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train1       = { OneTrain } EOF .
     OneTrain     = LocoPart GoodsPart HumanPart "." .
     LocoPart     = "loco" { "loco" } .
     GoodsPart    = Truck { Truck } .
     HumanPart    = "guard" | { "coach" } "brake" .
     Truck        = "coal" | "open" | "cattle" | "fuel" | "petrol" .
   END Train1.

Here is an attempt at safety. But this one insists on at least two safe trucks in any train, and is not LL(1)

   COMPILER Train2 $CN
   /* Grammar for safer railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR(1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train2       = { OneTrain } EOF .
     OneTrain     = LocoPart GoodsPart HumanPart "." .
     LocoPart     = "loco" { "loco" } .
     GoodsPart    = SafeTruck { AnyTruck } .
     HumanPart    = "guard" | SafeTruck { "coach" } "brake" .
     SafeTruck    = "coal" | "open" | "cattle" .
     AnyTruck     = SafeTruck | "fuel" | "petrol" .
   END Train2.

Why is it not LL(1)? We could apply all the theory of chapter 7, but maybe an example will suffice. Suppose we have a valid train like

    loco coal coal coal coal coach brake

The first coal truck is parsed by the leading SafeTruck in GoodsPart. The next two coal trucks must be parsed by the repetitive part { AnyTruck }, but you can probably see that the last coal truck would have to be parsed by the alternative within HumanPart. Unfortunately an LL(1) parser can't see far enough ahead to make that decision, and would be tempted to treat this last coal truck as part of the { AnyTruck } sequence.

Here is another one, but still non-LL(1)

   COMPILER Train3 $CN
   /* Grammar for safer railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR(1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train3       = { OneTrain } EOF .
     OneTrain     = LocoPart Load "." .
     LocoPart     = "loco" { "loco" } .
     Load         = SafeTruck Tail.
     Tail         =   { AnyTruck } "guard"
                    | Passengers
                    | { AnyTruck } SafeTruck Passengers .
     Passengers   = { "coach" } "brake" .
     SafeTruck    = "coal" | "open" | "cattle" .
     AnyTruck     = SafeTruck | "fuel" | "petrol" .
   END Train3.

and here is yet another, but still non-LL(1)

   COMPILER Train4 $CN
   /* Grammar for safer railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR (1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train4       = { OneTrain } EOF .
     OneTrain     = LocoPart Load "." .
     LocoPart     = "loco" { "loco" } .
     Load         = SafeTruck Tail.
     Tail         = Passengers | { AnyTruck } SafeTail .
     SafeTail     = SafeTruck ( Passengers | "guard" ) .
     Passengers   = { "coach" } "brake" .
     SafeTruck    = "coal" | "open" | "cattle" .
     AnyTruck     = SafeTruck | "fuel" | "petrol" .
   END Train4.

Here is one that is LL(1)

   COMPILER Train5 $CN
   /* Grammar for simple railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR(1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train5       = { OneTrain } EOF .
     OneTrain     = LocoPart GoodsPart HumanPart "." .
     LocoPart     = "loco" { "loco" } .
     GoodsPart    = SafeTruck MoreTrucks .
     MoreTrucks   = { FuelTruck { FuelTruck } SafeTruck | SafeTruck } .
     HumanPart    = "guard" | { "coach" } "brake" .
     SafeTruck    = "coal" | "open" | "cattle" .
     FuelTruck    = "fuel" | "petrol" .
   END Train5.

where you might notice that we have used a production rather like that used in describing the sequence of tunes in the Medley competition in Task 4. At first you might thing that this is, at last, a correct solution. But no, it isn't quite. This solution does not allow you to have a train like

    loco loco open fuel fuel guard .

as the last fuel truck in a sequence has to be followed by at least one safe truck. The grammar does, however, allow trains like

    loco open coach coach brake .

with only one truck in the goods section.

It is remarkable that something that at first sight looks so simple should turn out to be frustratingly difficult. Not being able to find an LL(1) grammar is not a train smash - one quite often cannot find an LL(1) grammar for a language. But it's usually worth a try, as parsers for LL(1) grammars are so easy to write. For a long time I have believed that one could not find an LL(1) grammar for this language, but this year various students have proved me wrong (it's always exciting when that happens!). Here is a solution that seems to meet all requirements. This one also allows empty trains, or passenger only trains as an extension to the original exercise.

   COMPILER Train7 $CN
   /* Grammar for simple railway trains
      P.D. Terry, Rhodes University, 2003 */

   IGNORE CASE
   IGNORE CHR(1) .. CHR(31)
   COMMENTS FROM "(*" TO "*)" NESTED

   PRODUCTIONS
     Train7       = { OneTrain } EOF .
     OneTrain     = LocoPart [ SafeLoad | "guard" | Passengers ] "." .
     LocoPart     = "loco" { "loco" } .
     Passengers   = { "coach" } "brake" .
     SafeLoad     = SafeTruck RestSafeLoad .
     RestSafeLoad = SafeLoad | "guard" | Passengers | Fuel .
     Fuel         = FuelTruck { FuelTruck } ( SafeLoad | "guard" ) .
     SafeTruck    = "coal" | "open" | "cattle" .
     FuelTruck    = "fuel" | "petrol" .
   END Train7.

Of course there are other ways around this problem, and we shall come back to it later.


Home  © P.D. Terry