Practical 24 solution

Computer Science 3 - 2006

Programming Language Translation

Practical for Week 24, beginning 9 October 2006 - solutions

As usual, the sources of full solutions for these problems may be found on the course web page as the file PRAC24A.ZIP or PRAC24AC.ZIP.

While there were some splendid submissions, there were also some very weak ones, so please study the suggestions below, as the ability to add attributes and actions to grammars is crucially important if you are to use a tool like Coco. Three general points: Firstly, make sure you incorporate the call to Parser.output.close() to close their output files. If you don't do this, the output is likely to be truncated badly, or even discarded. Secondly, I am amused every year to read submissions that have assignments like

A = B * -1;

instead of the much more obvious and simpler

A = -B;

This happens so often that I can only believe one of my colleagues must be teaching the clumsy technique for reasons best known to himself or herself! Lastly, many people had not done as requested and provided specimen output, which at least would have given some indication of how well their systems worked.

Tasks 3 and 4 - The Boolean calculator.

Most people had little difficulty with this. Most used two "parallel" arrays, one of the actual values for the variables, and one to mark those that had not yet been assigned values.

Note that the solution below uses the Variable production to extract the index of the variable, not its name, and the use of toUpperCase() ensures that the system ignores case completely. The IGNORECASE directive applies only to key words.

  import Library.*;
  import java.util.*;

  COMPILER Bool $CN
  /* Boolean expression calculator Java version
     P.D. Terry, Rhodes University, 2006 */

    static boolean[] mem = new boolean[26];
    static boolean[] defined = new boolean[26];

  IGNORECASE

  CHARACTERS
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
  TOKENS
    variable   = letter .

  COMMENTS FROM "(*" TO "*)"  NESTED
  COMMENTS FROM "/*" TO "*/"  NESTED

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Bool                               (. int index = 0;
                                          boolean value = false;
                                          for (int i = 0; i < 26; i++) defined[i] = false; .)
    = { ( Variable<out index>
          "=" Expression<out value>    (. mem[index] = value;
                                          defined[index] = true; .)
        |
          "print"
          Expression<out value>        (. IO.writeLine(value); .)
        )
        SYNC ";"
      } EOF .

    Variable<out int index>
    = variable                         (. index = Character.toUpperCase(token.val.charAt(0)) - 'A'; .)
    .

    Expression<out boolean value>      (. boolean termValue; .)
    = Term<out value>
      { Or Term<out termValue>         (. value = value || termValue; .)
      } .

    Term<out boolean value>            (. boolean factValue; .)
    = Factor<out value>
      { [ And ] Factor<out factValue>  (. value = value && factValue; .)
      } .

    Factor<out boolean value>          (. value = false; .)
    =   "NOT" Factor<out value>        (. value = ! value; .)
      | Primary<out value>
        { "'"                          (. value = ! value; .)
        }
    .

    Primary<out boolean value>         (. int index;
                                          value = false; .)
    =   True                           (. value = true; .)
      | False                          (. value = false; .)
      | Variable<out index>            (. if (!defined[index])
                                            SemError("variable not defined");
                                          value = mem[index]; .)
      | "(" Expression<out value> ")"
    .

    True   = "TRUE" | "1" .
    False  = "FALSE" | "0" .
    And    = "AND" | "&&" | "." .
    Or     = "OR" | "||" | "+" .
  END Bool.

The semicolon following each statement can be used as a convenient synchronization point.

Task 5 - A CSC 201 assembler lister

There was some very clumsy code submitted, and I do not recall anybody attempting what my solution achieves - namely only to print out blank character strings when they are really needed. Most solutions would have produced many lines with long sequences of trailing spaces on lines where opcodes or addresses were missing. And many solutions did not handle address fields with multiple terms properly, if at all.

The solution given below incorporates the idea of computing the length of what has been copied to an output line up to the point where the comment, if any, is detected. This could vary from zero (for lines that have only comments) to 9 (if there is a label only) to 13 (if there is a one byte opcode) to quite a lot more (if there is a two byte opcode with an address field). The optional comment routine then adds spaces to this - but only if the comment is really present. Note the trick employed to ensure that a comment on an END line and any trailing comments on lines after this will be correctly spaced as well.

  import Library.*;
  import java.util.*;

  COMPILER ASM $NC
  /* Define Assembler language as used in CSC 201
     Comments defined as tokens
     Pretty printing and cross reference listing features added
     P.D. Terry, Rhodes University, 2006 */

  public static OutFile output;

  static int
    labelWidth  = 9,
    hasOpCode   = 13,
    hasNoOpCode = 35,
    toComment   = hasNoOpCode - hasOpCode;

  static boolean
    hasLabel,
    stringAllowed,
    noMoreTerms;

  IGNORECASE

  CHARACTERS
    lf         = CHR(10) .
    cr         = CHR(13) .
    control    = CHR(0) .. CHR(31) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    binDigit   = "01" .
    hexDigit   = digit + "ABCDEFabcdef" .
    charCh     = ANY - "'" - control .
    printable  = ANY - control .

  TOKENS
    number     = digit { digit } | binDigit { binDigit } "%" | digit { hexDigit } "H" .
    identifier = letter { letter | digit } .
    string     = "'" charCh { charCh } "'" .
    EOL        = cr lf | lf .
    comment    = ";" { printable } .

  PRODUCTIONS
    ASM                                 (. Table.clearTable(); .)
    = Begin StatementSequence End       (. Table.printTable(); .)
    .

    Begin
    = { OptionalComment<hasNoOpCode> }
      "BEG"                             (. output.write("BEG", hasOpCode); .)
      OptionalComment<toComment>
    .

    End
    = "END"                             (. output.write("END", hasOpCode); .)
    { OptionalComment<toComment>        (. toComment = hasNoOpCode; .)
    } .

    OptionalComment<int toComment>      (. String commentText; .)
    = [ Comment<out commentText>        (. output.write(" ", toComment);
                                           output.write(commentText); .)
      ]
    SYNC EOL                            (. output.writeLine(); .)
    .

    StatementSequence                   (. int width; .)
    = { Statement<out width>
        OptionalComment<hasNoOpCode - width>
      }
    .

    Statement<out int width>            (. int adrWidth = 0;
                                           stringAllowed = false;
                                           width = 0; hasLabel = false; .)
    = [ Label                           (. width = labelWidth; hasLabel = true;
                                           output.write(token.val, -labelWidth);
                                           Table.addRef(token.val, true, token.line); .)
      ]
      [                                 (. output.write(" ", labelWidth + 1 - width); width = hasOpCode; .)
        (   OneByteOp                   (. output.write(token.val, -3); .)
          | TwoByteOp                   (. output.write(token.val, -3); .)
            Address<out adrWidth>       (. width += adrWidth; .)
        )
      ]
    .

    OneByteOp
    =    "ASR"  | "CLA"  | "CLC"
       | "CLV"  | "CLX"  | "CMA"
       | "CMC"  | "DEC"  | "DEX"
       | "HLT"  | "INA"  | "INB"
       | "INC"  | "INH"  | "INI"
       | "INX"  | "NOP"  | "OTA"
       | "OTB"  | "OTC"  | "OTH"
       | "OTI"  | "POP"  | "PSH"
       | "RET"  | "SHL"  | "SHR"
       | "TAX"
    .

    TwoByteOp
    =    "ACI"  | "ACX"  | "ADC"
       | "ADD"  | "ADI"  | "ADX"
       | "ANA"  | "ANI"  | "ANX"
       | "BCC"  | "BCS"  | "BGE"
       | "BGT"  | "BLE"  | "BLT"
       | "BNG"  | "BNZ"  | "BPZ"
       | "BRN"  | "BVC"  | "BVS"
       | "BZE"  | "CMP"  | "CPI"
       | "CPX"  | "DC"                  (. stringAllowed = true; .)
       | "DS"   | "EQU"                 (. if (!hasLabel) SemError("EQU directive must be labelled"); .)
       | "JGE"  | "JGT"  | "JLE"
       | "JLT"  | "JSR"  | "LDA"
       | "LDI"  | "LDX"  | "LSI"
       | "LSP"  | "ORA"  | "ORG"        (. if (hasLabel) SemError("ORG directive must not be labelled"); .)
       | "ORI"  | "ORX"  | "SBC"
       | "SBI"  | "SBX"  | "SCI"
       | "SCX"  | "STA"  | "STX"
       | "SUB"
    .

    Address<out int width>              (. int termWidth;
                                           noMoreTerms = false; .)
    =                                   (. output.write(' '); .)
      Term<out termWidth>               (. width = termWidth + 1;
                                           stringAllowed = false; .)
      {                                 (. if (noMoreTerms) SemError("invalid address"); .)
        (   '+'                         (. output.write(" + "); .)
            Term<out termWidth>         (. width += termWidth + 3; .)
          | '-'                         (. output.write(" - "); .)
            Term<out termWidth>         (. width += termWidth + 3; .)
        )
      }
    .

    Term<out int width>
    = (   Label                         (. Table.addRef(token.val, false, token.line); .)
        | IntConst
        | StringConst                   (. if (token.val.length() > 3) {
                                             if (!stringAllowed) SemError("strings not allowed here");
                                             noMoreTerms = true;
                                           } .)
        | '*'
      )                                 (. output.write(token.val); width = token.val.length(); .)
    .

    Label
    = identifier
    .

    IntConst
    = number
    .

    StringConst
    = string
    .

    Comment<out String commentText>
    = comment                           (. commentText = token.val; .)
    .

  END ASM.

Task 6 - Some constraint checks

This was not very well done. While many managed to do the easy checks on ORG and EQU, several groups assumed that a check was needed for the LDI opcode -but this is not so. None of the opcodes other than DC can take a "string" address! Furthermore, DC can take other addresses as well, a point that some people missed. Lastly, I don't think anybody really found a way to handle the problems that must be reported if the address for a DC opcode combines a long string with other terms.

So have a good look at the code, and learn from it!

Task 7 - A cross reference generator

There was some very clumsy code submitted; there was also some rather clean code. I decided to change and simplify the constructor for objects of the Entry class in my solution. Inevitably there were some who tried to use something more complex than the ArrayList, perhaps for reasons best known to themselves, but I think the code given below is clean and simple. Several students had not realised that the line number for a token returned by the scanner appears in the field token.line.

Task 8 - Consistent use of labels

The checks on the incorrect redefinition of labels are best made as an attempt is made to add a label to the symbol table. Several solutions revealed that people either had not thought that far, or were confused about the point of the defining argument to the addRef method.

The lists of unreferenced and undefined labels can only be made after the entire program has been parsed, and are conveniently incorporated into the Table.printTable() method. Few submissions made us of the StringBuffer class to capture the lists of these labels, as you had been advised to do.

  // Handle cross reference table for assembler labels
  // P.D. Terry, Rhodes University, 2006

  package ASM;

  import java.util.*;
  import Library.*;

    class Entry {                         // Cross reference table entries
      public String name;                 // the label's name
      public ArrayList refs;              // its list of line numbers
      public boolean defined = false;     // true if it has appeared in a label field
      public boolean referenced = false;  // true if it has appeared in an address field

      public Entry(String name) {
        this.name = name;
        this.refs = new ArrayList();
      }
    } // Entry

    class Table {
      static ArrayList list = new ArrayList();

      public static void clearTable() {
      // Clears cross-reference table
        list = new ArrayList();
      }

      public static void addRef(String name, boolean defining, int lineRef) {
      // Enters name if not already there, adds another line reference (negative if
      // it appears in the label field in the source text) and
      // updates the status of the label (referenced or defined)
        int i = 0;
        while (i < list.size() && !name.equals(((Entry)list.get(i)).name)) i++;
        if (i >= list.size()) list.add(new Entry(name));
        Entry e = (Entry)list.get(i);   // i is now the index of the old or new reference!
        if (defining) {                 // mark as defined, but check first
          if (e.defined) Parser.SemError("label already defined");
          e.defined = true;
        }
        else e.referenced = true;       // mark as referenced (no need to check)
        e.refs.add(new Integer(defining ? -lineRef : lineRef));
      }

      public static void printTable() {
      // Prints out all references in the table (eliminate duplicates line numbers)
        StringBuffer
          undefined    = new StringBuffer(),
          unreferenced = new StringBuffer();
        Parser.output.writeLine("\nCross reference table\n");
        for (int i = 0; i < list.size(); i++) {
          Entry e = (Entry)list.get(i);
          Parser.output.write(e.name, -18);              // left justify in 18 spaces
          int last = 0;                                  // impossible line number
          for (int j = 0; j < e.refs.size(); j++) {
            int line = ((Integer)e.refs.get(j)).intValue();
            if (line != last) {
              Parser.output.write(line, 5);              // left justify in 5 spaces
              last = line;                               // remember we have printed this line
            }
          }
          Parser.output.writeLine();
          if (!e.defined)    undefined.append(e.name + " ");
          if (!e.referenced) unreferenced.append(e.name + " ");
        } // for
        if (undefined.length() > 0) {                    // this is a serious error - report it
          Parser.SemError("Undefined labels - " + undefined.toString());
          Parser.output.writeLine("\nUndefined labels - " + undefined.toString());
        }
        if (unreferenced.length() > 0)                   // for information only
          Parser.output.writeLine("\nUnreferenced labels - " + unreferenced.toString());
      }

  /* a simpler version that will print out duplicate line references if required is

      public static void printTable() {
      // Prints out all references in the table (eliminates duplicate line numbers)
        StringBuffer
          undefined    = new StringBuffer(),
          unreferenced = new StringBuffer();
        Parser.output.writeLine("\nCross reference table\n");
        for (int i = 0; i < list.size(); i++) {
          Entry e = (Entry)list.get(i);
          Parser.output.write(e.name, -18);              // left justify in 18 spaces
          for (int j = 0; j < e.refs.size(); j++) {
            int line = ((Integer)e.refs.get(j)).intValue();
            Parser.output.write(line, 5);                // left justify in 5 spaces
          }
          Parser.output.writeLine();
          if (!e.defined)    undefined.append(e.name + " ");
          if (!e.referenced) unreferenced.append(e.name + " ");
        } // for
        if (undefined.length() > 0) {                    // this is a serious error - report it
          Parser.SemError("Undefined labels - " + undefined.toString());
          Parser.output.writeLine("\nUndefined labels - " + undefined.toString());
        }
        if (unreferenced.length() > 0)                   // for information only
          Parser.output.writeLine("\nUnreferenced labels - " + unreferenced.toString());
      }

    */

    } // Table