Practical 24 solutions

Computer Science 3 - 2005

Programming Language Translation

Practical for Week 24, beginning 10 October 2005 - solutions

As usual, the sources of full solutions for these problems may be found on the course web page as the file PRAC24A.ZIP or PRAC24AC.ZIP.

While there were some splendid submissions, there were also some very weak ones, so please study the suggestions below, as the ability to add attributes and actions to grammars is crucially important if you are to use a tool like Coco. Three general points: Firstly, several people had not incorporated the call to Parser.output.close() to close their output files. If you don't do this, the output is likely to be truncated badly, or even discarded. Secondly, I am amused every year to read submissions that have assignments like

A = B * -1;

instead of the much more obvious and simpler

A = -B;

This happens so often that I can only believe one of my colleagues must be teaching the clumsy technique for reasons best known to himself or herself! Lastly, many people had not done as requested and provided specimen output, which at least would have given some indication of how well their systems worked.

Task 3 - The extended calculator.

Most people had little difficulty with this. Many used two "parallel" arrays, one of double values and one of boolean values, to solve the problem of not using variables before they had been given values. An alternative solution just to show how it might be done using an array of objects representing the encapsulated properties of each variable would be as follows:

  import Library.*;
  import java.util.*;

  COMPILER Calc  $CN
  /* Simple four function calculator with 26 memory cells - extended
     P.D. Terry, Rhodes University, 2005 */

    static class MemCell {
      public double value = 0.0;
      public boolean defined = false;
    }

    static MemCell[] mem = new MemCell[26];

  CHARACTERS
    digit      = "0123456789" .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
  TOKENS
    Number     = digit { digit } [ "." { digit } ] .
    Variable   = letter .

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Calc                               (. int index = 0; double value = 0.0;
                                          for (int i = 0; i < 26; i++) mem[i] = new MemCell(); .)
    = { (
            Variable                   (. index = token.val.charAt(0) - 'A'; .)
            WEAK "="
            Expression<out value>      (. mem[index].value = value;
                                          mem[index].defined = true; .)
          | "print"
            Expression<out value>      (. IO.writeLine(value); .)
         ) SYNC ";"
      } EOF .

    Expression<out double expVal>      (. double expVal1 = 0.0; .)
    = Term<out expVal>
      {   "+" Term<out expVal1>        (. expVal += expVal1; .)
        | "-" Term<out expVal1>        (. expVal -= expVal1; .)
      } .

    Term<out double termVal>           (. double termVal1 = 0.0; .)
    = Factor<out termVal>
      {   "*" Factor<out termVal1>     (. termVal *= termVal1; .)
        | "/" Factor<out termVal1>     (. if (termVal1 == 0.0) SemError("division by zero");
                                          else termVal /= termVal1; .)
      } .

    Factor<out double factVal>         (. factVal = 0.0; .)
    =   Number                         (. try {
                                            factVal = Float.parseFloat(token.val);
                                          } catch (NumberFormatException e) {
                                            factVal = 0; SemError("number out of range");
                                          } .)
      | Variable                       (. int index = token.val.charAt(0) - 'A';
                                          if (!mem[index].defined) SemError("undefined value");
                                          else factVal = mem[index].value; .)
      | "(" Expression<out factVal>
        ")" .

  END Calc.

This has altered the grammar to demand that a semicolon follow each statement so that it can be used as a synchronization point.

Task 4 - An assembler pretty printer

This is really rather easy once you get the general idea. A few points are worth making. Firstly, several people attempted to get the desired spacing by arranging for the output to contain "tabs" at suitable points. These have the disadvantage that tab setting differ from one output medium or library to another, and what might look pretty in some situations won't look pretty in others. Better far to use the "fixed width" output routines in the Library. Secondly, it would have been necessary to augment the grammar to handle the extended opcode set that includes such operations as LDC_0. Thirdly (and rather subtly), comments extended from the semicolon up to and including the CR. The convention on Wintel systems is that ends of lines are marked by a CR LF sequence, rather then LF only. Few people had realised that this called for special treatment (and doubtless did not notice, as most editors will handle LF or CR LF as almost equivalent.

  import Library.*;

  COMPILER PVMAsm $CN
  /* Grammar for subset of PVM assembler language
     P.D. Terry, Rhodes University, 2005 */

    public static OutFile output;
    static int count = 0;

  CHARACTERS
    control     = CHR(0) .. CHR(31) .
    Printable   = ANY - control .
    InString    = Printable - '"' .
    Digits      = "0123456789" .
    LF          = CHR(10) .
    CR          = CHR(13) .

  TOKENS
    Number      = [ "-" ] Digits { Digits } .
    String      = '"' { InString } '"' .
    EOL         = LF .
    Comment     = ";" { Printable } CR .

  IGNORE control - LF

  PRODUCTIONS
    PVMAsm
    = { Statement } EOF .

    Statement
    = [ Number ]
      (                                   (. output.write(count, -5); .)
          Instruction
        |                                 (. output.write(" ", -27); .)
      )
      [ Comment                           (. output.write(token.val.substring(0, token.val.length() - 1)); .)
      ] SYNC EOL                          (. output.writeLine(); .)
    .


  /*  Alternative that puts the count in place even for lines with no instruction
    Statement
    = [ Number ]                          (. output.write(count, -5); .)
      (   Instruction
        |                                 (. output.write(" ", -22); .)
      )
      [ Comment                           (. output.write(token.val.Substring(0, token.val.length() - 1)); .)
      ] SYNC EOL                          (. output.writeLine(); .)
    .
  */

    Instruction
    = TwoWord | OneWord | PrintString .

    TwoWord
    = (   "LDA" | "LDC" | "DSP"
        | "LDL" | "STL"
        | "BRN" | "BZE"
      )                                   (. output.write(token.val, -7); .)
      Number                              (. output.write(token.val, -15);
                                             count += 2; .)
    .

    OneWord
    = (   "ADD"   | "AND"   | "ANEW"   | "CEQ"
        | "CGE"   | "CGT"   | "CLE"    | "CLT"
        | "CNE"   | "DIV"   | "HALT"   | "INPB"
        | "INPI"  | "LDV"   | "LDXA"   | "MUL"
        | "NEG"   | "NOP"   | "NOT"    | "OR"
        | "PRNB"  | "PRNI"  | "PRNL"   | "REM"
        | "STO"   | "SUB"   | "LDC-M1"
        | "LDC_0" | "LDC_1" | "LDC_2"  | "LDC_3"
        | "LDA_0" | "LDA_1" | "LDA_2"  | "LDA_3"
        | "LDL_0" | "LDL_1" | "LDL_2"  | "LDL_3"
        | "STL_0" | "STL_1" | "STL_2"  | "STL_3"
      )                                   (. output.write(token.val, -22);
                                             count++; .)
    .

    PrintString
    = "PRNS"                              (. output.write(token.val, -7); .)
      String                              (. output.write(token.val, -15);
                                             count += 2; .)
    .

  END PVMAsm.

Task 5 - Playing with trains again

Some dreadfully complicated solutions were submitted. Try always to find an elegant solution. Here is one, using a single static Boolean field to handle the safety problem:

  import Library.*;

  COMPILER Trains $CN
  /* Grammar for railway trains with simple safety regulations
     P.D. Terry, Rhodes University, 2005 */

    static boolean
      danger, hasFreight, safe;
    static final int  // type of train
      passenger = 0,
      freight   = 1,
      mixed     = 2,
      empty     = 3;
    static int type;

    public static OutFile output;

  IGNORECASE

  COMMENTS FROM "(*" TO "*)" NESTED

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Trains = { OneTrain } EOF .

    OneTrain
    =                                    (. danger = false;
                                            type = empty;
                                            safe = true;
                                            hasFreight = false; .)
      LocoPart
      [ [ GoodsPart                      (. hasFreight = true; .)
        ]
        HumanPart
      ]
      SYNC "."                           (. output.writeLine(" .");
                                            switch(type) {
                                              case passenger:
                                                output.write("passenger train"); break;
                                              case mixed:
                                                output.write("mixed freight/passenger train"); break;
                                              case freight:
                                                output.write("freight train"); break;
                                              case empty:
                                                output.write("empty train"); break;
                                            }
                                            output.write(" - safety regulations ");
                                            if (safe) output.writeLine("obeyed");
                                            else output.writeLine("contravened");
                                            output.writeLine(); .)
      .

    LocoPart
    = "loco"                             (. output.write(" " + token.val); .)
      { "loco"                           (. output.write(" " + token.val); .)
      } .

    GoodsPart
    = Truck                              (. if (danger) {
                                              safe = false;
                                              SemError("fuel truck may not follow loco");
                                            } .)
      { Truck } .

    HumanPart
    = "guard"                            (. output.write(" " + token.val);
                                            type = freight; .)
      |                                  (. if (danger) {
                                              safe = false;
                                              SemError("fuel truck may not precede coach");
                                            } .)
        { "coach"                        (. output.write(" " + token.val); .)
        } "brake"                        (. output.write(" " + token.val);
                                            if (hasFreight) type = mixed;
                                            else type = passenger; .)
      .

    Truck
    = (   (   "coal" | "cold"
            | "open" | "cattle" )        (. danger = false; .)
        | "fuel"                         (. danger = true; .)
      )                                  (. output.write(" " + token.val); .)
    .

  END Trains.

Several people were guided into using a set of state variables remembering the last kind of rolling stock parsed. Here is a solution on those lines:

  import Library.*;

  COMPILER Trains2 $CN
  /* Grammar for railway trains with simple safety regulations
     P.D. Terry, Rhodes University, 2005 */

    static boolean
      hasFreight, safe;
    static final int  // type of train
      passenger = 0,
      freight   = 1,
      mixed     = 2,
      empty     = 3;


    static final int  // kind of last component
      safeTruck = 1,
      fuelTruck = 2,
      humans    = 3,
      loco      = 4;

    static int type, lastSeen;

    public static OutFile output;

  IGNORECASE

  COMMENTS FROM "(*" TO "*)" NESTED

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Trains2 = { OneTrain } EOF .

    OneTrain
    =                                    (. type = empty;
                                            safe = true;
                                            hasFreight = false; .)
      LocoPart
      [ [ GoodsPart                      (. hasFreight = true; .)
        ]
        HumanPart
      ]
      SYNC "."                           (. output.writeLine(" .");
                                            switch(type) {
                                              case passenger:
                                                output.write("passenger train"); break;
                                              case mixed:
                                                output.write("mixed freight/passenger train"); break;
                                              case freight:
                                                output.write("freight train"); break;
                                              case empty:
                                                output.write("empty train"); break;
                                            }
                                            output.write(" - safety regulations ");
                                            if (safe) output.writeLine("obeyed");
                                            else output.writeLine("contravened");
                                            output.writeLine(); .)
      .

    LocoPart
    = "loco"                             (. output.write(" " + token.val); .)
      { "loco"                           (. output.write(" " + token.val); .)
      }                                  (. lastSeen = loco; .)
    .

    GoodsPart
    = Truck { Truck } .

    HumanPart
    = "guard"                            (. output.write(" " + token.val);
                                            type = freight; .)
      |                                  (. if (lastSeen == fuelTruck) {
                                              safe = false;
                                              SemError("fuel truck may not precede coach");
                                            }
                                            lastSeen = humans; .)
        { "coach"                        (. output.write(" " + token.val); .)
        } "brake"                        (. output.write(" " + token.val);
                                            if (hasFreight) type = mixed;
                                            else type = passenger; .)
    .

    Truck
    = (   (   "coal" | "cold"
            | "open" | "cattle" )        (. lastSeen = safeTruck; .)
        | "fuel"                         (. if (lastSeen == loco) {
                                              safe = false;
                                              SemError("fuel truck may not follow loco");
                                            }
                                            lastSeen = fuelTruck; .)
      )                                  (. output.write(" " + token.val); .)
    .

  END Trains2.

Task 6 - The EBNF cross reference generator.

Once again, this is capable of a very simple elegant solution (hint: most Pat Terry problems admit to a simple elegant solution; the trick is to find it, so learn from watching the Expert in Action, and pick up the tricks for future reference). There are only two places in the basic grammar where non-terminals appear, and it is here that we must arrange to insert them into the table:

  import Library.*;

  COMPILER EBNF $CN
  /* Parse a set of EBNF productions
     P.D. Terry, Rhodes University, 2005 */

    public static OutFile output;

  CHARACTERS
    letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    lowline  = "_" .
    control  = CHR(0) .. CHR(31) .
    digit    = "0123456789" .
    noquote1 = ANY - "'" - control .
    noquote2 = ANY - '"' - control .

  TOKENS
    nonterminal = letter { letter | lowline | digit } .
    terminal    = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' .

  COMMENTS FROM "(*" TO "*)"  NESTED

  IGNORE control

  PRODUCTIONS
     EBNF                            (. Table.clearTable(); .)
     = { Production }                (. Table.printTable(); .)
       EOF .

     Production
     = SYNC nonterminal              (. Table.addRef(token.val, true, token.line); .)
       WEAK "=" Expression SYNC "." .

     Expression
     = Term { WEAK "|" Term } .

     Term
     = [ Factor { Factor } ] .

     Factor
     =   nonterminal                 (. Table.addRef(token.val, false, token.line); .)
       | terminal
       | "[" Expression "]"
       | "(" Expression ")"
       | "{" Expression "}" .
  END EBNF.

Of course, the bulk of the effort has to be spent in deriving a suitable table handler. Here is one that builds upon the suggestion in the prac sheet:

  // Handle cross reference table for EBNF productions
  // P.D. Terry, Rhodes University, 2005

  package EBNF;

  import java.util.*;
  import Library.*;

    class Entry {                      // Cross reference table entries
      public String name;              // The identifier itself
      public ArrayList refs;           // Line numbers where it appears
      public Entry(String name) {
        this.name = name;
        this.refs = new ArrayList();
      }
    } // Entry

    class Table {
      static ArrayList list = new ArrayList();

      public static void clearTable() {
      // Clears cross-reference table
        list = new ArrayList();
      }

      public static void addRef(String name, boolean declared, int lineRef) {
      // Enters name if not already there, adds another line reference (negative if at
      // a declaration point in the original set of productions
        int i = 0;
        while (i < list.size() && !name.equals(((Entry)list.get(i)).name)) i++;
        if (i >= list.size()) list.add(new Entry(name));
        ((Entry)list.get(i)).refs.add(new Integer(declared ? -lineRef : lineRef));
      }

      public static void printTable() {
      // Prints out all references in the table
        StringBuffer missing = new StringBuffer();
        for (int i = 0; i < list.size(); i++) {
          boolean isDeclared = false;                    // haven't seen a definition yet
          Entry e = (Entry)list.get(i);                  // note the need to cast
          Parser.output.write(e.name, -18);              // left justify in 18 spaces
          for (int j = 0; j < e.refs.size(); j++) {      // work through the list of references
            int line = ((Integer)e.refs.get(j)).intValue();
            Parser.output.write(line, 5);                // justify in 5 spaces
            isDeclared = isDeclared || line < 0;
          }
          Parser.output.writeLine();
          if (!isDeclared) missing.append(e.name + " "); // build up list of undeclared nonterminals
        }
        Parser.output.writeLine();
        if (missing.length() > 0) {                      // no need if there were none
          Parser.output.writeLine("The following are terminals, or undefined non-terminals");
          Parser.output.writeLine();
          Parser.output.writeLine(missing.toString());
        }
      }

    } // Table

The printTable method above suffers from a possible disadvantage in that multiple occurrences of an non- terminal on one line, as in

Term = [ Factor { Factor } ] .

create unnecessary duplicate entries. These could be eliminated in various ways; the simplest might be to do so at the output stage, rather than when they are added by the addRef method. Please yourself; here is my suggestion.

      public static void printTable() {
      // Prints out all references in the table (eliminate duplicates line numbers)
        StringBuffer missing = new StringBuffer();
        for (int i = 0; i < list.size(); i++) {
          boolean isDeclared = false;                    // haven't seen a definition yet
          Entry e = (Entry)list.get(i);                  // note the need to cast
          Parser.output.write(e.name, -18);              // left justify in 18 spaces
          int last = 0;                                  // impossible line number
          for (int j = 0; j < e.refs.size(); j++) {      // work through the list of references
            int line = ((Integer)e.refs.get(j)).intValue();
            isDeclared = isDeclared || line < 0;
            if (line != last) {                          // a new line reference
              Parser.output.write(line, 5);              // justify in 5 spaces
              last = line;                               // remember we have printed this line
            }
          }
          Parser.output.writeLine();
          if (!isDeclared) missing.append(e.name + " "); // build up list of undeclares nonterminals
        }
        Parser.output.writeLine();
        if (missing.length() > 0) {                      // no need if there were none
          Parser.output.writeLine("The following are terminals, or undefined non-terminals");
          Parser.output.writeLine();
          Parser.output.writeLine(missing.toString());
        }
      }

Several solutions revealed that people either had not thought that far or were confused about the point of the declared argument to the addRef method.