Practical for week 24

Computer Science 3 - 2012

Programming Language Translation

Practical for Week 24, beginning 8 October 2012 - solutions

As usual, the sources of full solutions for these problems may be found on the course web page as the file PRAC24A.ZIP or PRAC24AC.ZIP.

While there were some splendid submissions, there were also some very weak ones, so please study the suggestions below, as the ability to add attributes and actions to grammars is crucially important if you are to use a tool like Coco.

Few people had done as requested and provided specimen output, which at least would have given some indication of how well their systems worked.

Task 3 - The extended calculator.

Most people had little difficulty with this. Many used two "parallel" arrays, one of double values and one of boolean values, to solve the problem of not using variables before they had been given values. An alternative solution just to show how it might be done using an array of objects representing the encapsulated properties of each variable would be as follows:

  import library.*;
  import java.util.*;

  COMPILER Calc  $CN
  /* Simple four function calculator with 26 memory cells - extended
     P.D. Terry, Rhodes University, 2012 */

  // We can include the simple class definition "in-line" as here:

    static class MemCell {
      public double value = 0.0;
      public boolean defined = false;
    }

    static MemCell[] mem = new MemCell[26];

  CHARACTERS
    digit      = "0123456789" .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .

  TOKENS
    Number     = digit { digit } [ "." { digit } ] .
    Variable   = letter .

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Calc                               (. int index = 0; double value = 0.0;
                                          for (int i = 0; i < 26; i++) mem[i] = new MemCell(); .)
    = { (
            Variable                   (. index = token.val.charAt(0) - 'A'; .)
            WEAK "="
            Expression<out value>      (. mem[index].value = value;
                                          mem[index].defined = true; .)
          | "print"
            Expression<out value>      (. IO.writeLine(value); .)
         ) SYNC ";"
      } EOF .

    Expression<out double expVal>      (. double expVal1 = 0.0; .)
    = Term<out expVal>
      {   "+" Term<out expVal1>        (. expVal += expVal1; .)
        | "-" Term<out expVal1>        (. expVal -= expVal1; .)
      } .

    Term<out double termVal>           (. double termVal1 = 0.0; .)
    = Factor<out termVal>
      {   "*" Factor<out termVal1>     (. termVal *= termVal1; .)
        | "/" Factor<out termVal1>     (. if (termVal1 == 0.0) SemError("division by zero");
                                          else termVal /= termVal1; .)
      } .


    Factor<out double factVal>         (. factVal = 0.0;
                                          double factVal2; .)
    =   Number                         (. try {
                                            factVal = Float.parseFloat(token.val);
                                          } catch (NumberFormatException e) {
                                            factVal = 0; SemError("number out of range");
                                          } .)
      | Variable                       (. int index = token.val.charAt(0) - 'A';
                                          if (!mem[index].defined) SemError("undefined value");
                                          else factVal = mem[index].value; .)
      | "(" Expression<out factVal> ")"
      | "sqrt"
        "(" Expression<out factVal>    (. if (factVal < 0) {
                                            SemError("square root of negative value");
                                            factVal = 0.0;
                                          }
                                          else factVal = Math.sqrt(factVal); .)
        ")"
      | "max"
        "(" Expression<out factVal>
            ","
            Expression<out factVal2>   (. if (factVal2 > factVal) factVal = factVal2; .)
        ")" .

  END Calc.

This has altered the grammar to demand that a semicolon follow each statement so that it can be used as a synchronization point.

An alternative way of introducing the max function would be to allow it to have multiple arguments. This is easily done:

      | "max"
        "(" Expression<out factVal>
            { WEAK ","
              Expression<out factVal2> (. if (factVal2 > factVal) factVal = factVal2; .)
            }
        ")" .

Task 4 - Who are our most highly qualified staff?

This was intended to be quite simple. A point many people missed was that first names had to be replaced by initials. Here is one solution:

    import library.*;

    COMPILER Staff $CN
    /* Describe a list of academic staff in a department, and extract those with PhD degrees
       P.D. Terry, Rhodes University, 2012 */

      static StringBuilder sb;
      static String lastName;

    CHARACTERS
      uLetter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      lLetter   = "abcdefghijklmnopqrstuvwxyz" .
      letter    = uLetter + lLetter .
      control   = CHR(0) .. CHR(31) .
      varsity   = ANY - control - ")" .

    TOKENS
      name      = uLetter { letter | "'" uLetter | "-" uLetter } .
      initial   = uLetter "." .
      university= '(' { varsity } ')' .


    IGNORE control

    PRODUCTIONS
      Staff     = { Person } EOF .

      Person                                           (. sb = new StringBuilder(); .)
      = [ Title ] FullName { "," Degree } SYNC "."  .

      FullName  = NameLast { NameLast } .

      NameLast  = { initial                            (. sb.append(token.val); .)
                  } name                               (. sb.append(token.val.charAt(0));
                                                          sb.append('.');
                                                          lastName = token.val; .) .

      Title
      = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" .

      Degree
      = (  "BA" | "BSc" | "BCom" | "BMus" | "BEd"
          | "BSocSci"   | "BSc(Hons)"
          | "BCom(Hons)" | "MA" | "MSc"
          | "PhD"                                      (. String initials = sb.toString();
                                                          initials = initials.substring(0, initials.length() - 2);
                                                          IO.writeLine("Dr "  + initials + " " + lastName); .)
        )
        [ university ] .

    END Staff.

The solution above appends an initial corresponding to the surname (the very last name) to the string builder for the initials, and then removes it again before the final name is printed. This might strike you as a bit inelegant. Indeed it should! Terry Theorem 2: There is always a better way:

    PRODUCTIONS

      FullName  = NameLast {                           (. sb.append(lastName.charAt(0));
                                                          sb.append('.'); .)
                  NameLast } .

      NameLast  = { initial                            (. sb.append(token.val); .)
                  } name                               (. lastName = token.val; .)  .

      Degree
      = (  "BA" | "BSc" | "BCom" | "BMus" | "BEd"
          | "BSocSci"   | "BSc(Hons)"
          | "BCom(Hons)" | "MA" | "MSc"
          | "PhD"                                      (. String initials = sb.toString();
                                                          IO.writeLine("Dr "  + initials + " " + lastName); .)
        ) [ university ] .

Note the ability to introduce actions before a non-terminal is parsed, as shown in the FullName production. Kind of cute, don't you think?

There was a tendency to write this sort of code:

    COMPILER Staff $CN
    /* When will some of you learn to put your names and a brief description at the start of your files? */

**    static StringBuilder sb = new StringBuilder();                                      // initialise

      ....

      Degree
      = (  "BA" | "BSc" | "BCom" | "BMus" | "BEd"
          | "BSocSci"   | "BSc(Hons)"
          | "BCom(Hons)" | "MA" | "MSc"
          | "PhD"                                      (. String initials = sb.toString();
                                                          IO.writeLine("Dr "  + initials + " " + lastName);
**                                                        sb = new StringBuilder();   // re-initialise .)
        ) [ university ] .

which strikes me as odd - have you really been taught to end while loop bodies with "initialisation" code?

Mind you, on that theme, somebody clearly teaches students to write a = (-1) * b; rather than more simply a = -b, not to mention that other horror, if (someBoolExpression == true) ... rather than simply if (someBooleanExpression) ...

Tasks 5 - 7 - An assembler for the PVM

Since these tasks each added one refinement at a time to a working system, it should suffice simply to produce a fully integrated solution. In summary, you were asked to

Add a few opcodes to the language;
Deal with escape sequences in the strings used in PRNS operations;
Add the ability to use names for variables, rather than ordinal numbers, and for the assemnler to keep track of these in a table, in which would be recorded their offset addresses for use in LDL, LDA and STL operations;
Print a table of cross references, giving a list of the line numbers in which reference had been made to each variable, and also a table of line numbers for the declaration and use of each of the labels labels used in BRN and BZE operations;
Pretty-print the source code;

Each of these refinements is relatively easy to achieve, with the aid of the parsing environment that is built and maintained in the synbol table.

An attributed grammar follows. In some respects this is very close to the one presented in the text book in Chapter 11. The code generator is as in Chapter 11, and the version of the PVM is essentially one that corresponds to the ones used in Practical 20 - however, for the curious, this version of the PVM is a little more efficient, as the operations in each of the "case arms" of the switch statement in the interpreter are coded using direct manpulation of the stack pointer rather than calls to push() amd pop().

    import java.util.*;
    import library.*;

    COMPILER Assem $NC
    /* Simple assembler for the PVM - Java version
       P.D. Terry, Rhodes University, 2012 */

      static final boolean
        known = true;

      /* By naming the fieldwidths we would easily be able to change the appearance of
         pretty-printed programs should we wish to do so. */

      static final int                      // for pretty-printing
        LabelWidth    = 8,
        Opcode2Width  = 7,
        Address2Width = 15,
        Opcode1Width  = Opcode2Width + Address2Width;

      public static OutFile pretty;         // Pretty-printer and cross-reference output

      // This next method might better be located in the code generator.  Traditionally
      // it has been left in the ATG file, but that might change in future years
      //
      // Not that while sequences like \n \r and \t result in special mappings to lf, cr and tab
      // other sequences like \x \: and \9 simply map to x, ; and 9 .  Most students don't seem
      // to know this!

      static String unescape(String s) {
      /* Replaces escape sequences in s by their Unicode values */
        StringBuilder buf = new StringBuilder();
        int i = 0;
        while (i < s.length()) {
          if (s.charAt(i) == '\\') {
            switch (s.charAt(i+1)) {
              case '\\': buf.append('\\'); break;
              case '\'': buf.append('\''); break;
              case '\"': buf.append('\"'); break;
              case  'r': buf.append('\r'); break;
              case  'n': buf.append('\n'); break;
              case  't': buf.append('\t'); break;
              case  'b': buf.append('\b'); break;
              case  'f': buf.append('\f'); break;
              default:   buf.append(s.charAt(i+1)); break;
            }
            i += 2;
          }
          else {
            buf.append(s.charAt(i));
            i++;
          }
        }
        return buf.toString();
      } // unescape

    IGNORECASE

    CHARACTERS
      lf         = CHR(10) .
      cr         = CHR(13) .
      backSlash  = CHR(92) .
      control    = CHR(0) .. CHR(31) .
      letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      digit      = "0123456789" .
      printable  = ANY - control .
      stringCh   = printable - '"' - backSlash .

      /* Since we need to retain comments we cannot use the COMMENTS FROM directive. Since each statement
         ends at the end of a line we must be careful.  This won't work

              comment = ";" { printable } lf .

         and although this works

              comment = ";" { printable } cr .

         it relies on there being a CR/lf sequence in the source code - which is true for Wintel systems,
         but not on Unix.  The best solution is probably as shown below: */

    TOKENS
      identifier = letter { letter | digit } .
      number     = [ '+' | '-' ] digit { digit } .
      label      = letter { letter | digit } ":" .
      stringLit  = '"' { stringCh | backSlash printable } '"' .
      EOL        = lf .
      comment    = ";" { printable } .


    IGNORE CHR(9) .. CHR(13) - lf

    PRODUCTIONS

      /* Note the following syntax, which allows any number of empty lines to appear before ASSEM,
         between ASSEM and BEGIN and after the final full stop.  It will, however; not permit you to have
         comments attached to the ASSEM or BEGIN, or to have line with only comments preceding the BEGIN.
         As an easy exercise you might consider what you would need to change to permit such comments.
      */

      Assem
      = { EOL }
        "ASSEM"                     (. pretty.writeLine("ASSEM"); .)
        { EOL }
        "BEGIN"                     (. pretty.writeLine("BEGIN"); .)
        EOL
        { Statement }
        "END" "." { EOL }           (. LabelTable.checkLabels();
                                       pretty.writeLine("END.");
                                       pretty.writeLine();
                                       LabelTable.listReferences(pretty);
                                       VarTable.listReferences(pretty);
                                       pretty.close(); .) .

      /* Note the use of an "action" applied to an empty option, as mentioned in the handout.  There are
         other ways of doing this pretty-printing, of course.  You will note that in the way suggested
         here, much use is made of the "left justified" output routines, which work very well in this
         sort of application. */

      Statement
      = (   Label                   (. pretty.write(token.val, -LabelWidth); .)
          |                         (. pretty.write(" ", -LabelWidth); .)
        )
        (   Instruction
          |                         (. pretty.write(" ", -Opcode1Width); .)
        )
        [ comment                   (. pretty.write(token.val); .)
        ] SYNC EOL                  (. pretty.writeLine(); .) .

      Instruction
      = TwoWord | OneWord | Branch | WriteString .

      /* Adding the extra opcodes as key words is very straightforward! */

      OneWord
      = (   "ADD"    | "AND"   | "ANEW"  | "CEQ"   | "CGE"   | "CGT"   | "CLE"   | "CLT"
          | "CNE"    | "DIV"   | "HALT"  | "INPB"  | "INPI"  | "LDV"   | "LDXA"  | "MUL"
          | "NEG"    | "NOP"   | "NOT"   | "OR"    | "PRNB"  | "PRNI"  | "PRNL"  | "REM"
          | "LDL_0"  | "LDL_1" | "LDL_2" | "LDL_3" | "STL_0" | "STL_1" | "STL_2" | "STL_3"
          | "LDA_0"  | "LDA_1" | "LDA_2" | "LDA_3" | "LDC_0" | "LDC_1" | "LDC_2" | "LDC_3"
          | "LDC_M1" | "STO"   | "SUB"   | "INC"   | "DEC"
        )                            (. CodeGen.oneWord(token.val);
                                        pretty.write(token.val, -Opcode1Width); .) .

      /* Note that the DSP and LDC opcodes must still take an arguent limited to a known constant.  I
         don't think anyone realised this.  In a better assembler one would probably be able to "name"
         magic number constants with syntax like

               MAX   EQU   1234   ; define constant MAX
                     LDC   MAX    ; Push 1234 onto the evaluation stack

         but that will have to wait for another day.

         Note that the method for retrieving the offset address of a variable is now passed the line
         number of the instruction where that label is mentioned.  There is no need to count the lines
         separately, as some people try to do - the token returned by the scanner has the fields
         token.line and token.col which can be used to identify the start of the token in the source text
         being parsed */

      TwoWord                        (. int value = 0;
                                        String name; .)
      =   ( "DSP" | "LDC" )          (. String mnemonic = token.val;
                                        pretty.write(mnemonic, -Opcode2Width); .)
            Number<out value>        (. CodeGen.twoWord(mnemonic, value);
                                        pretty.write(token.val, -Address2Width); .)
        |
          ( "LDA" | "LDL" | "STL" )  (. String mnemonic = token.val;
                                        pretty.write(mnemonic, -Opcode2Width); .)
          (   Number<out value>
            | Ident<out name>        (. value = VarTable.findOffset(name, token.line); .)
          )                          (. CodeGen.twoWord(mnemonic, value);
                                        pretty.write(token.val, -Address2Width); .) .

      Number<out int value>
      = number                       (. try {
                                          value = Integer.parseInt(token.val);
                                        } catch (NumberFormatException e) {
                                          value = 0; SemError("number out of range");
                                        } .) .

      WriteString                    (. String str; .)
      = "PRNS"                       (. String mnemonic = token.val;
                                        pretty.write(mnemonic, -Opcode2Width); .)
        StringConst<out str>         (. CodeGen.writeString(str);
                                        pretty.write(token.val, -Address2Width); .) .

      /* In this system, strings appear only in PRNS statements.  The string returned by the scanner
         includes the delimitining quotes " at either ende which must be stripped off, and any internal
         escape sequences like \n \\ and \x must be replaced by the byte value of the (usually control)
         character so designated. */

      StringConst<out String str>
      =  stringLit                   (. str = unescape(token.val.substring(1, token.val.length() - 1)); .) .

      /* When a label field is encountered, all that is needed to help with the cross-reference of labels
         is to add the line number when a label is declared, if this happens before it is used, or to add
         the reference if it is used first and the branches to it are backpatched later.  By supplying
         the negative of the line number the declarations will be highlihted later when the list is
         printed */

      Label                          (. Label lab; .)
      = label                        (. String name = token.val.substring(0, token.val.length() - 1).toLowerCase();
                                        LabelEntry entry = LabelTable.find(name);
                                        if (entry == null) {
                                          lab = new Label(known);
                                          LabelTable.insert(new LabelEntry(name, lab, -token.line));
                                        }
                                        else if (entry.label.isDefined())
                                          SemError("redefined label");
                                        else {
                                          entry.label.here();
                                          entry.addReference(-token.line);
                                        } .) .

      /* When labels are used in the addressfields of branch instructions, a new reference (with a
         positive line number) is made if the label has not previously been seen, or an additional
         reference is added to the corresponding list if the location of the label is already known. */

      Branch                         (. int target;
                                        String name;
                                        Label lab; .)
      = ( "BRN" | "BZE" )            (. String mnemonic = token.val;
                                        pretty.write(mnemonic, -Opcode2Width);  .)
        (   Number<out target>       (. CodeGen.twoWord(mnemonic, target); .)
          | Ident<out name>          (. LabelEntry entry = LabelTable.find(name);
                                        if (entry == null) {
                                          lab = new Label(!known);
                                          LabelTable.insert(new LabelEntry(name, lab, token.line));
                                        }
                                        else {
                                          lab = entry.label;
                                          entry.addReference(token.line);
                                        }
                                        CodeGen.branch(mnemonic, lab); .)
        )                            (. pretty.write(token.val, -Address2Width); .) .

      Ident<out String name>
      = identifier                   (. name = token.val.toLowerCase(); .) .

    END Assem.

The table handler I hacked together for this solution (before issuing a bonus challenge to improve on it) was as follows - note that there are, effectively two table handlers, one for labels and one for variables (but with a number of close similarities). The changes from the sytstem supplied in the tool kit are highlighted in the margin.

    // Handle label table for simple Parva assembler
    // P.D. Terry, Rhodes University, 2012

    package Assem;

    import java.util.*;
    import library.*;

      class LabelEntry {

        public String name;
        public Label label;
 **     public ArrayList<Integer> refs = null;

        public LabelEntry(String name, Label label, int lineNumber) {
          this.name  = name;
          this.label = label;
 **       this.refs  = new ArrayList<Integer>();
 **       this.refs.add(lineNumber);
        }

 **     public void addReference(int lineNumber) {
 **       this.refs.add(lineNumber);
 **     } // addReference

      } // end LabelEntry

    // -------------------------------------------------------------------------------------

      class LabelTable {

        private static ArrayList<LabelEntry> list = new ArrayList<LabelEntry>();

        public static void insert(LabelEntry entry) {
        // Inserts entry into label table
          list.add(entry);
        } // insert

        public static LabelEntry find(String name) {
        // Searches table for label entry matching name.  If found then returns entry.
        // If not found, returns null
          int i = 0;
          while (i < list.size() && !name.equals(list.get(i).name)) i++;
          if (i >= list.size()) return null; else return list.get(i);
        } // find

        public static void listReferences(OutFile output) {
        // Cross reference list of all labels used on output file
 **       output.writeLine();
 **       output.writeLine("Labels:");
 **       output.writeLine();
 **       if (list.size() == 0) output.writeLine("none");
 **       else
 **         for (LabelEntry entry : list) {
 **           output.write(entry.name, -10);
 **           if (!entry.label.isDefined())
 **             output.write(" (undefined) ");
 **           else output.write(" (defined)   ");
 **           for (int line : entry.refs) output.write(line, 5);
 **           output.writeLine();
 **         }
        } // listReferences

        public static void checkLabels() {
        // Checks that all labels have been defined (no forward references outstanding)
          for (int i = 0; i < list.size(); i++) {
            if (!list.get(i).label.isDefined())
              Parser.SemError("undefined label - " + list.get(i).name);
          }
        } // checkLabels

      } // end LabelTable

    // -------------------------------------------------------------------------------------

      class VariableEntry {

        public String name;
        public int offset;
 **     public ArrayList<Integer> refs = null;

        public VariableEntry(String name, int offset, int lineNumber) {
          this.name   = name;
          this.offset = offset;
 **       this.refs   = new ArrayList<Integer>();
 **       this.refs.add(lineNumber);
        }

 **     public void addReference(int lineNumber) {
 **       this.refs.add(lineNumber);
 **     } // addReference

      } // end VariableEntry

    // -------------------------------------------------------------------------------------

      class VarTable {

        private static ArrayList<VariableEntry> list = new ArrayList<VariableEntry>();
        private static int varOffset = 0;

        public static int findOffset(String name, int lineNumber) {
        // Searches table for variable entry matching name.  If found then returns the known offset.
        // If not found, makes an entry and updates the master offset
 **       int i = 0;
 **       while (i < list.size() && !name.equals(list.get(i).name)) i++;
 **       if (i >= list.size()) {
 **         list.add(new VariableEntry(name, varOffset, lineNumber));
 **         return varOffset++;
 **       }
 **       else {
 **         list.get(i).addReference(lineNumber);
 **         return list.get(i).offset;
 **       }
        } // findOffset

        public static void listReferences(OutFile output) {
        // Cross reference list of all variables on output file
 **       output.writeLine();
 **       output.writeLine("Variables:");
 **       output.writeLine();
 **       if (list.size() == 0) output.writeLine("none");
 **       else
 **         for (VariableEntry entry : list) {
 **           output.write(entry.name, -10);
 **           output.write("(offset  " + entry.offset, -10); output.write(") ");
 **           for (int line : entry.refs) output.write(line, 5);
 **           output.writeLine();
 **         }
        } // listReferences

      } // end VarTable