Practical for week 25

Computer Science 3 - 2012

Programming Language Translation

Practical for Weeks 25 - 26, beginning 15 October 2012 - solutions

Sources of full solutions for these problems may be found on the course web page as the file PRAC25A.ZIP (Java) or PRAC25AC.ZIP (C#).

Task 2 - Use of the debugging and other pragmas

The extra pragmas needed in the refined Parva compiler are easily introduced. We need some static fields:

    public static boolean
      debug    = false,
 *    listCode = false,
 *    warnings = true;

The definitions of the pragmas are done in terms of these:

    PRAGMAS
      DebugOn      = "$D+" .                (. debug = true; .)
      DebugOff     = "$D-" .                (. debug = false; .)
 *    CodeOn       = "$C+" .                (. listCode  = true; .)
 *    CodeOff      = "$C-" .                (. listCode  = false; .)
 *    WarnOn       = "$W+" .                (. warnings  = true; .)
 *    WarnOff      = "$W-" .                (. warnings  = false; .)
 *    OptOn        = "$O+" .                (. optimize = true; .)
 *    OptOff       = "$O-" .                (. optimize = false; .)

It is convenient to be able to set the options with command line parameters as well. This involves a straightforward change to the Parva.frame file:

    for (int i = 0; i < args.length; i++) {
      if (args[i].toLowerCase().equals("-l")) mergeErrors = true;
      else if (args[i].toLowerCase().equals("-d")) Parser.debug = true;
      else if (args[i].toLowerCase().equals("-w")) Parser.warnings = false;
 *    else if (args[i].toLowerCase().equals("-c")) Parser.listCode = true;
 *    else if (args[i].toLowerCase().equals("-o")) Parser.optimize = true;
      else inputName = args[i];
    }
    if (inputName == null) {
      System.err.println("No input file specified");
 *    System.err.println("Usage: Parva [-l] [-d] [-w] [-c] [-o] source.pav");
      System.err.println("-l directs source listing to listing.txt");
      System.err.println("-d turns on debug mode");
      System.err.println("-w suppresses warnings");
 *    System.err.println("-c lists object code (.cod file)");
 *    System.err.println("-o produces better code");
      System.exit(1);
    }

Finally, the following change to the frame file gives the option of suppressing the generation of the .COD listing.

 *    if (Parser.listCode) PVM.listCode(codeName, codeLength);

Task 3 - Learning many languages is sometimes confusing

To be as sympathetic as possible in the face of confusion between various operators is easily achieved - we make the sub-parsers that identify these operators accept the incorrect ones, at the expense of generating an error message (or, if you want to be really kind, issue a warning only, but it is better as an error, I think):

 *  RelOp<out int op>                       (. op = CodeGen.nop; .)
 *  =    (   "="                            (. SemError("== intended?"); .)
 *         | "==" )                         (. op = CodeGen.ceq; .)
 *     | (   "<>"                           (. SemError("!= intended?"); .)
 *         | "!=" )                         (. op = CodeGen.cne; .)
 *     | "<"                                (. op = CodeGen.clt; .)
 *     | "<="                               (. op = CodeGen.cle; .)
 *     | ">"                                (. op = CodeGen.cgt; .)
 *     | ">="                               (. op = CodeGen.cge; .) .

 *    AssignOp
 *  =    "="
 *     | ":="                             (. SemError("= intended?"); .) .

Similarly, recovering from the spurious introduction of then into an IfStatement is quite easily achieved. At this stage it looks like this (but see later tasks).

    IfStatement<StackFrame frame>         (. Label falseLabel = new Label(!known); .)
    =  "if" "(" Condition ")"             (. CodeGen.branchFalse(falseLabel); .)
 *     [ "then"                           (. SemError("then is not used in Parva"); .)
       ] Statement<frame>                 (. falselabel.here(); .) .

Task 4 - Things are not always what they seem

Issuing warnings for empty statements is easy

    Statement<StackFrame frame>
 *  =  SYNC (   Block<out empty, frame>
              | ConstDeclarations
              | VarDeclarations<frame>                                                          //
              | AssignmentStatement
              | IfStatement<frame>
              | WhileStatement<frame>
              | HaltStatement
              | ReturnStatement
              | ReadStatement
              | WriteStatement
              | "stackdump" ";"           (. if (debug) CodeGen.dump(); .)
 *            | ";"                       (. if (warnings) Warning("empty statement"); .)
            ) .

Task 5 - How long is a piece of string?

The prac sheet asked why languages generally impose a restriction that a literal string must be contained on a single line of code. The reason is quite simple - it becomes difficult to see or track the control characters and spaces that would otherwise be buried in the string. It is easier and safer for language designers to use the escape sequence idea if they need to cater for non-graphic characters in strings and character literals.

Concatenating strings is simple. The place to do it is in the StringConst production which calls on a OneString parser to obtain the substrings (which have had their leading quotes and internal escape characters processed by the time the concatenation takes place):

    StringConst<out String str>           (. String str2; .)
    = OneString<out str>
      { [ "+" ] OneString<out str2>       (. str = str + str2; .)
      } .

    OneString<out String str>
    =  stringLit                          (. str = token.val;
                                             str = unescape(str.substring(1, str.length() - 1)); .) .

Task 6 - "I am lost, but see in simple childhood something of the base" (Wordsworth)

Firstly we extend the character sets:

    CHARACTERS
      lf         = CHR(10) .
      cr         = CHR(13) .
      backslash  = CHR(92) .
      control    = CHR(0) .. CHR(31) .
      letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
 *    decDigit   = "0123456789" .
 *    decFirst   = "123456789" .
 *    binDigit   = "01" .
 *    hexDigit   = decDigit + "ABCDEFabcdef" .
      stringCh   = ANY - '"' - control - backslash .
      charCh     = ANY - "'" - control - backslash .
      printable  = ANY - control .

and modify the TOKENS section to use these:

    TOKENS
      decNumber  = "0" | decFirst { decDigit } .
      binNumber  = binDigit { binDigit } "%" .
      hexNumber  = decDigit { hexDigit } ( "H" | "h" ) .

and then we extend the IntConst production:

    IntConst<out int value>                 (. value = 0; .)
 *    (   decNumber                         (. value = strToInt(token.val, 10); .)
 *      | binNumber                         (. value = strToInt(token.val.substring(0, token.val.length() - 1), 2); .)
 *      | hexNumber                         (. value = strToInt(token.val.substring(0, token.val.length() - 1), 16); .)
 *    ) .

where the auxiliary function strToInt() is defined (at the top of the grammar) as

 *  static int strToInt(String str, int inBase) {
 *  // Returns the integer value of the lexeme str expressed in a given base
 *    int value = 0;
 *    try {
 *      value = Integer.parseInt(str, inBase);
 *    } catch (NumberFormatException e) {
 *      value = 0; SemError("number too large");
 *    }
 *    return value;
 *  }

Task 7 - Two of the three Rs - Reading and Writing

The extensions to ReadStatement and WriteStatement are very simple. It is useful to allow both the readLine() and writeLine() forms of these statements to have empty argument lists, something a little meaningless for the former read() and write() statements. The extensions needed to the code generator and the PVM should be obvious, but it was clear from some submissions that some of you are unfamiliar with the concept of readLine() as a device for ignoring the rest of a line in the data file (not, for heaven's sake, in the grammar!). And while we are at it, let's make the HaltStatement as general as possible.

 *  ReadStatement
 *  = (  "read"     "("   ReadList    ")"
 *     | "readLine" "(" [ ReadList ]  ")"   (.CodeGen.readLine(); .)
 *    )
 *    WEAK ";" .
 *
 *  ReadList
 *  = ReadElement { WEAK "," ReadElement } .
 *
 *  WriteStatement
 *  = (   "write"     "("   WriteList   ")"
 *      | "writeLine" "(" [ WriteList ] ")" (. CodeGen.writeLine(); .)
 *    )
 *    WEAK ";" .
 *
 *  WriteList
 *  = WriteElement { WEAK "," WriteElement } .
 *
 *  HaltStatement                           /* optional arguments! */
 *  = "halt" [ "(" [ WriteList ] ")" ]
 *    WEAK ";"                              (. CodeGen.leaveProgram(); .) .
 *

Task 8 - Repeat these exercises until you get the idea

Adding the basic repeat - until loop to Parva is very easy too, since all that is needed is a "backward" branch.

 *  RepeatStatement<StackFrame frame>     (. Label startLoop = new Label(known); .)
 *  =  "repeat"
 *       { Statement<frame> }
 *     WEAK "until"
 *     "(" Condition ")" WEAK ";"         (. CodeGen.branchFalse(startLoop); .) .

You might like to think of how we could warn users who write silly code like this (or even forbid it).

        repeat until (x > 4);         // Excuse me, repeat what, exactly?

Task 9 - You had better do this one or else....

The problem, firstly, asked for the addition of an else option to the IfStatement. Adding an else option to the IfStatement efficiently is easy once you see the trick. Note the use of the "no else part" option associated with an action, even in the absence of any terminals or non-terminals. As mentioned earlier, this is a very useful technique to remember.

    IfStatement<StackFrame frame>         (. int count;
                                             Label falseLabel = new Label(!known);
                                             Label outLabel = new Label(!known); .)
    =  "if" "(" Condition ")"             (. CodeGen.branchFalse(falseLabel); .)
       [ "then"                           (. SemError("then is not used in Parva"); .)
 *     ] Statement<frame>
 *     (   "else"                         (. CodeGen.branch(outLabel);
 *                                           falseLabel.here(); .)
 *         Statement<frame>               (. outLabel.here(); .)
 *       | /* no else part */             (. falseLabel.here(); .)
 *     ) .

Many - perhaps most - people in attempting this problem come up with the following sort of thing instead. This can generate BRN instructions where none are needed. Devoid of checking, just to save space:

    IfStatement<StackFrame frame>         (. Label falseLabel = new Label(!known);
                                             Label outLabel = new Label(!known); .)
    =  "if" "(" Condition ")"             (. CodeGen.branchFalse(falseLabel); .)
         Statement<frame>                 (. CodeGen.branch(outLabel);
                                             falseLabel.here(); .)
       [ "else"  Statement<frame>  ]      (. outLabel.here(); .) .

For example, source code like

if (i == 12) k = 56;

leads to object code like

        12    LDA  0
        14    LDV
        15    LDC  12
        17    CEQ
        18    BZE  27
        20    LDA  5
        22    LDC  56
        24    STO
        25    BRN  27        // unnecessary
        27    ....

Task 10 - This has gone on long enough - time for a break

The syntax of the BreakStatement is, of course, trivial. The catch is that one has to allow these statements only in the context of loops. Trying to find a context-free grammar with this restriction is not worth the effort.

One approach that incorporates context-sensitive checking in conjunction with code generation is based on passing labels as arguments to various subparsers. We change the parsers for Statement and Block as follows:

 *  Statement<StackFrame frame, Label breakLabel>
 *  =  SYNC (   Block<frame, breakLabel>
              | ConstDeclarations
              | VarDeclarations<frame>
              | AssignmentStatement
 *            | IfStatement<frame, breakLabel>
              | WhileStatement<frame>
              | RepeatStatement<frame>
 *            | BreakStatement<breakLabel>
              | HaltStatement
              | ReturnStatement
              | ReadStatement
              | WriteStatement
              | "stackdump" ";"           (. if (debug) CodeGen.dump(); .)
              | ";"                       (. if (warnings) Warning("empty statement"); .)
            ) .

 *  Block<StackFrame frame, Label breakLabel>
    =                                     (. Table.openScope(); .)
       "{"
 *        { Statement<frame, breakLabel>
          }
       WEAK "}"                           (. Table.closeScope(); .) .

The very first call to Statement passes null as the value for the breakLabel:

                                               StackFrame frame = new StackFrame();
                                               Table.openScope();
                                               Label DSPLabel = new Label(known);
                                               CodeGen.openStackFrame(0); .)
 *     WEAK "{" { Statement<frame, null> }
       WEAK "}"                             (. if (debug) Table.printTable(OutFile.StdOut);

The parsers for the statements that are concerned with looping, breaking, and making decisions become

 *  IfStatement<StackFrame frame, Label breakLabel>
                                          (. Label falseLabel = new Label(!known);
                                             Label outLabel = new Label(!known); .)
    =  "if" "(" Condition ")"             (. CodeGen.branchFalse(falseLabel); .)
       [ "then"                           (. SemError("then is not used in Parva"); .)
 *     ] Statement<frame, breakLabel>
 *     (   "else"                         (. CodeGen.branch(outLabel);
 *                                           falseLabel.here(); .)
 *         Statement<frame, breaklabel>   (. outLabel.here(); .)
 *       | /* no else part */             (. falseLabel.here(); .)
 *     ) .
    .

    WhileStatement<StackFrame frame>      (. Label loopExit  = new Label(!known);
                                             Label loopStart = new Label(known); .)
    =  "while" "(" Condition ")"          (. CodeGen.branchFalse(loopExit); .)
 *     Statement<frame, loopExit>         (. CodeGen.branch(loopStart);
                                             loopExit.here(); .) .

    BreakStatement<Label breakLabel>
 *  =  "break"                            (. if (breakLabel == null)
 *                                             SemError("break is not allowed here");
 *                                           else CodeGen.branch(breakLabel); .)
       WEAK ";" .

    RepeatStatement<StackFrame frame>     (. Label loopExit = new Label(!known);
                                             Label loopStart = new Label(known); .)
    =  "repeat"
 *       { Statement<frame, loopExit> }
       WEAK "until"
       "(" Condition ")" WEAK ";"         (. CodeGen.branchFalse(loopStart);
 *                                           loopExit.here(); .) .

There is at least one other way of solving the problem, which involves using a local variable in the parsing methods to "stack" up the old label, assigning a new one, and then restoring the old one afterwards. Only one person tried this (and got it correct). The fact that so many other people used the method given here suggests to me, that this solution was passed around more than it should have been, although a hint was given in one lecture.

Now that you have seen it done, think about how you would add a ContinueStatement to Parva as well.

Task 11 - Your professor is quite a character

To allow for a character type involves one in a lot of straightforward alterations, as well as some more elusive ones and few, if any, submissions were even vaguely correct. in fact, most were pretty terrible, suggesting tha many students had simply not grasped the concept of type and compatibility at all. These are important concepts in compiling, so make sure you study what follows very carefully.

Firstly, we extend the definition of the Types class:

    class Types {
      public static final int
        noType   =  0,             // identifier (and expression) types.  The numbering is
        nullType =  2,             // significant as array types are denoted by these
        intType  =  4,             // numbers + 1
        boolType =  6,
 *      charType =  8,
 *      voidType = 10;
    } // end Types

The Table class requires a similar small change to introduce the new type name needed if the symbol table is to be displayed:

  *  typeNames.Add("char");
  *  typeNames.Add("char[]");

A minor change to the Constant production is needed to allow character literals to be regarded as of the new charType. Most submissions at least got this far:

    Constant<out ConstRec con>            (. con = new ConstRec(); .)
    =   IntConst<out con.value>           (. con.type = Types.intType; .)
 *    | CharConst<out con.value>          (. con.type = Types.charType; .)
      | "true"                            (. con.type = Types.boolType; con.value = 1; .)
      | "false"                           (. con.type = Types.boolType; con.value = 0; .)
      | "null"                            (. con.type = Types.nullType; con.value = 0; .) .

Reading and writing single characters is easy, unless as some submissions did, you just missed this!

    ReadElement                           (. String str;
                                             DesType des; .)
 *  =   StringConst<out str>              (. CodeGen.writeString(str); .)
      | Designator<out des>               (. if (des.entry.kind != Kinds.Var)
                                               SemError("wrong kind of identifier");
                                             switch (des.type) {
                                               case Types.intType:
                                               case Types.boolType:
 *                                             case Types.charType:
                                                 CodeGen.read(des.type); break;
                                               default:
                                                 SemError("cannot read this type"); break;
                                             } .) .

    WriteElement                          (. int expType;
                                             String str; .)
 *  =   StringConst<out str>              (. CodeGen.writeString(str); .)
      | Expression<out expType>           (. switch (expType) {
                                               case Types.intType:
                                               case Types.boolType:
 *                                             case Types.charType:
                                                 CodeGen.write(expType); break;
                                               default:
                                                 SemError("cannot write this type"); break;
                                             } .) .

The associated code generating methods require matching additions:

    public static void read(int type) {
    // Generates code to read a value of specified type
    // and store it at the address found on top of stack
      switch (type) {
        case Types.intType:  emit(PVM.inpi); break;
        case Types.boolType: emit(PVM.inpb); break;
 *      case Types.charType: emit(PVM.inpc); break;
      }
    }

    public static void write(int type) {
    // Generates code to output value of specified type from top of stack
      switch (type) {
        case Types.intType:  emit(PVM.prni); break;
        case Types.boolType: emit(PVM.prnb); break;
 *      case Types.charType: emit(PVM.prnc); break;
      }
    }

The major part of this exercise was concerned with the changes needed to apply various constraints on operands of the char type. Essentially, and annoyingly perhaps, in the C family of languages it is a sort of arithmetic type when this is convenient (this is called "auto-promotion". Explicitly, it ranks as an arithmetic type, in that expressions of the form

character + character
character > character
character + integer integer + character
character > integer integer > character

are all allowable. This can be handled by modifying the helper methods in the parser as follows:

    static boolean isArith(int type) {
 *    return type == Types.intType || type == Types.charType || type == Types.noType;
    }

    static boolean compatible(int typeOne, int typeTwo) {
 *  // Returns true if typeOne is compatible (and comparable for equality) with typeTwo
      return    typeOne == typeTwo
 *           || isArith(typeOne) && isArith(typeTwo)
             || typeOne == Types.noType || typeTwo == Types.noType
             || isRef(typeOne) && typeTwo == Types.nullType
             || isRef(typeTwo) && typeOne == Types.nullType;
    }

The preceding discussion relates to expression compatibility. However, assignment compatibility is more restrictive. Assignments of the form

integer = integer expression
integer = character expression
character = character expression

are allowed, but

character = integer expression

is not allowed. This may be checked with the aid of a further helper method, assignable(). I don't recall any submission making this distinction, preferring to tweak compatible incorrectly.

 *  static boolean assignable(int typeOne, int typeTwo) {
 *  // Returns true if a variable of typeOne may be assigned a value of typeTwo
 *    return    typeOne == typeTwo
 *           || typeOne == Types.intType && typeTwo == Types.charType
 *           || typeOne == Types.noType || typeTwo == Types.noType
 *           || isRef(typeOne) && typeTwo == Types.nullType;
 *  }

The assignable() function call now takes the place of the compatible() function call in the many places in OneVar and AssignmentStatement where, previously, calls to compatible() appeared.

We turn finally to consideration of the changes needed to the various sub-parsers for expressions.

A casting mechanism is introduced to handle the situations where it is necessary explicitly to convert integer values to characters, so that

character = (char) integer

is allowed, and for completeness, so are

integer = (int) character
integer = (char) character
character = (char) character

A great many submissions made the mistake of thinking that a cast can only appear in the context of a simple assignment, that is, restricted to statements like

    char  ch;
    int x, y, z;

    ch = (char) SomeExpression
    ch = (char) x + y + z;  // understood wrongly to "mean"
                            // ch = (char) (x + y + z);  // a character assignment

but that is not the case. Casting applies to a component of an expression, so that the above really "means":

    ch = ((char) x)  + y + z;    // an incorrect integer assignment again
                                 // integer expressions are incompatible with
                                 // character target designators

and as a further example it is quite legal to write

    boolean b = 'A' < (char) (x + (int) 'B') ;

Of course, the C family syntax is crazy (in spite of what some of my colleagues think). It would have been infinitely better to use a notation like

    boolean b = 'A' > char(x + int('B'));

but they would not have been able to do that easily (do you see why? It might be an exam question; you never know my devious mind).

To get it right requires that casting be handled within the Factor production, which has to be factored to deal with the potential LL(1) trap in distinguishing between components of the form "(" "char" ")" and "(" Expression ")":

Casting operations are accompanied by a type check and a type conversion; the (char) cast also introduces the generation of run-time code for checking that the integer value to be converted lies within range.

    Factor<out type>                      (. type = Types.noType;
                                             int size;
                                             DesType des;
                                             ConstRec con; .)
    =    Designator<out des>              (. type = des.type;
                                             switch (des.entry.kind) {
                                               case Kinds.Var:
                                                 CodeGen.dereference();
                                                 break;
                                               case Kinds.Con:
                                                 CodeGen.loadConstant(des.entry.value);
                                                 break;
                                               default:
                                                 SemError("wrong kind of identifier");
                                                 break;
                                             } .)
       | Constant<out con>                (. type = con.type;
                                             CodeGen.loadConstant(con.value); .)
       | "new" BasicType<out type>        (. type++; .)
         "[" Expression<out size>         (. if (!isArith(size))
                                               SemError("array size must be integer");
                                             CodeGen.allocate(); .)
         "]"

 *     | "("
 *       (   "char" ")"
 *           Factor<out type>             (. if (!isArith(type))
 *                                             SemError("invalid cast");
 *                                           else type = Types.charType;
 *                                           CodeGen.castToChar(); .)
 *         | "int" ")"
 *           Factor<out type>             (. if (!isArith(type))
 *                                             SemError("invalid cast");
 *                                           else type = Types.intType; .)
 *         | Expression<out type> ")"
 *       ) .

Strictly speaking the above grammar departs slightly from the Java version, where the casting operator is regarded as weaker than the parentheses around an Expression, but in practice it makes little difference.

Various of the other productions need modification. The presence of an arithmetic operator correctly placed between character or integer operands must result in the sub-expression so formed being of integer type (and never of character type). So, for example:

     Term<out int type>                   (. int type2;
                                             int op;
                                             Label shortcircuit = new Label(!known); .)
     =  Factor<out type>
        { MulOp<out op>                   (. if (op == CodeGen.and)
                                               CodeGen.booleanOp(shortcircuit, CodeGen.and); .)
          Factor<out type2>               (. switch (op) {
                                               case CodeGen.and:
                                                 if (!isBool(type) || !isBool(type2))
                                                   SemError("boolean operands needed");
                                                 type = Types.boolType;
                                                 break;
                                               default:
                                                 if (!isArith(type) || !isArith(type2)) {
                                                   SemError("arithmetic operands needed");
                                                   type = Types.noType;
                                                 }
 *                                               else type = Types.intType;
                                                 CodeGen.binaryOp(op);
                                                 break;
                                             } .)
        }                                 (. shortcircuit.here(); .)
     .

Similarly a prefix + or - operator applied to an integer or character Factor creates a new factor of integer type (see full solution for details).

The extra code generation method we need is as follows:

    public static void castToChar() {
    // Generates code to check that TOS is within the range of the character type
      emit(PVM.i2c);
    }

and within the switch statement of the emulator method we need:

    case PVM.i2c:           // check convert character to integer
      if (mem[cpu.sp] < 0 || mem[cpu.sp] > maxChar) ps = badVal;
      break;

The interpreter has another opcode for checked storage of characters, but if the i2c opcodes are inserted correctly it appears that we do not really need stoc:

    case PVM.stoc:          // character checked store
      tos = pop(); adr = pop();
      if (inBounds(adr))
        if (tos >= 0 && tos <= maxChar) mem[adr] = tos;
        else ps = badVal;
      break;

Task 12 - Make the change; enjoy life; upgrade now to Parva`++` (Ta-ra!)

It might not at first have been obvious, but I had fondly hoped that everyone eventually saw that this extension is handled at the initial level by clever modifications to the AssignmentStatement production, which has to be factorized in such a way as to avoid LL(1) conflicts. (alas, I was wrong again. Didn't we discuss this very point in lectures and in prac 21?) The code below handles this task (including the tests for compatibility and for the designation of variables rather than constants that several students omitted) by assuming the existence of a few new machine opcodes, as suggested in the textbook.

    AssignmentStatement                   (. int expType;
                                             DesType des;
 *                                           boolean inc = true; .)
    = (  Designator<out des>              (. if (des.entry.kind != Kinds.Var)
                                               SemError("invalid assignment"); .)
          (   AssignOp
              Expression<out expType>     (. if (!assignable(des.type, expType))
                                               SemError("incompatible types in assignment");
                                             CodeGen.assign(des.type); .)
 *          | ( "++" | "--"               (. inc = false; .)
 *            )                           (. if (!isArith(des.type))
 *                                             SemError("arithmetic type needed");
 *                                           CodeGen.incOrDec(inc, des.type); .)
 *         )
 *      |  ( "++" | "--"                   (. inc = false; .)
 *         ) Designator<out des>           (. if (des.entry.kind != Kinds.Var)
 *                                             SemError("variable designator required");
 *                                           if (!isArith(des.type))
 *                                             SemError("arithmetic type needed");
 *                                           CodeGen.incOrDec(inc, des.type); .)
      ) WEAK ";" .

The extra code generation routine is straightforward, but note that we should cater for characters specially

    public static void incOrDec(boolean inc, int type) {
    // Generates code to increment the value found at the address currently
    // stored at the top of the stack.
    // If necessary, apply character range check
 *    if (type == Types.charType) emit(inc ? PVM.incc : PVM.decc);
 *    else emit(inc ? PVM.inc : PVM.dec);
    }

As usual, the extra opcodes in the PVM make all this easy to achieve at run time. Some submissions might have forgotten to include the check that the address was "in bounds". I suppose one could argue that if the source program were correct, then the addresses could not go out of bounds, but if the interpreter were to be used in conjunction with a rather less fussy assembler (as we had in earlier practicals) it would make sense to be cautious.

    case PVM.inc:           // int ++
      adr = pop();
      if (inBounds(adr)) mem[adr]++;
      break;
    case PVM.dec:           // int --
      adr = pop();
      if (inBounds(adr)) mem[adr]--;
      break;
    case PVM.incc:          // char ++
      adr = pop();
      if (inBounds(adr))
        if (mem[adr] < maxChar) mem[adr]++;
        else ps = badVal;
      break;
    case PVM.decc:          // char --
      adr = pop();
      if (inBounds(adr))
        if (mem[adr] > 0) mem[adr]--;
        else ps = badVal;
      break;

Task 13 - All assignments have equals, some have more equals than others

These extensions may be handled by further modifications to the AssignmentStatement production. The code below shows this (including the tests for compatibility that some students omitted) by assuming the existence of the DUP opcode, as suggested in the textbook.

Clearly nobody realised that there are Boolean assignments of the same form; I don't remember any attempts to incorporate them. For the assignment operators &= and |= we have to make sure that we generate "short circuit" operations (maybe some of you missed the lecture on short circuit evaluation too? Revenge might be sweet!). This adds a bit to the apparent complexity of the production, as you can see below:

    AssignmentStatement                   (. int expType;
                                             DesType des;
                                             int op;
 *                                           boolean inc = true;
                                             Label shortcircuit = new Label(!known);.)
    = ( Designator<out des>               (. if (des.entry.kind != Kinds.Var)
                                               SemError("invalid assignment"); .)
 *     (  AssignOp<out op>                (. if (op != CodeGen.nop) {
 *                                            CodeGen.duplicate();
 *                                            CodeGen.dereference();
 *                                            switch(op) {
 *                                              case CodeGen.add:
 *                                              case CodeGen.sub:
 *                                              case CodeGen.mul:
 *                                              case CodeGen.div:
 *                                              case CodeGen.rem:
 *                                                if (!isArith(des.type))
 *                                                  SemError("arithmetic destination required");
 *                                                break;
 *                                              case CodeGen.and:
 *                                              case CodeGen.or:
 *                                                if (!isBool(des.type))
 *                                                  SemError("boolean destination required");
 *                                                CodeGen.booleanOp(shortcircuit, op);
 *                                                break;
 *                                            }
 *                                          } .)
 *        Expression<out expType>         (. if (!assignable(des.type, expType))
 *                                             SemError("incompatible types in assignment");
 *                                           switch (op) {
 *                                             case CodeGen.nop:
 *                                               break;
 *                                             case CodeGen.and:
 *                                             case CodeGen.or:
 *                                               shortcircuit.here();
 *                                               break;
 *                                             case CodeGen.add:
 *                                             case CodeGen.sub:
 *                                             case CodeGen.mul:
 *                                             case CodeGen.div:
 *                                             case CodeGen.rem:
 *                                               CodeGen.binaryOp(op);
 *                                               break;
 *                                           }
 *                                           CodeGen.assign(des.type); .)
         | ( "++" | "--"                  (. inc = false; .)
           )                              (. if (!isArith(des.type))
                                               SemError("arithmetic type needed");
                                             CodeGen.incOrDec(inc, des.type); .)
       )
      |  ( "++" | "--"                    (. inc = false; .)
         ) Designator<out des>            (. if (des.entry.kind != Kinds.Var)
                                               SemError("variable designator required");
                                             if (!isArith(des.type))
                                               SemError("arithmetic type needed");
                                             CodeGen.incOrDec(inc, des.type); .)
      )
      WEAK ";" .

The above code requires only the one new PVM code. One quite ingenious submission achieved (most of) the required effect by introduciing new opcodes that would line up PVM code as for a simple assignment, and then substitute the new opcode in place of the STO. So, for example, and assignment like

           a *= b + 400;

would generate code like

where the interpretation of STP would be

          case  PVM.stp:
            int val = pop();
            int adr = pop();
            if (inBounds(adr)) mem[adr] = mem[adr] * val;

As I have often said, I enjoy it when students come up with innovative solutions to my exercises. In this case they had not handled Boolean assignments, or bothered with assignment compatibility checks, but you might all like to think how the correct solution should have been formulated. Nice examination question, thanks guys!

Task 14 - Beg, borrow and steal ideas from other languages

This exercise called on you to extend Parva to adopt an idea used in Pascal, where a statement like

write(X : 5, X + A : 12, X - Y : 2 * N);

will write the values of X, X+A and X-Y in fields of widths 5, 12 and 2*N respectively. This is easily handled by modifying the production for WriteElement:

    WriteElement                          (. int expType, formType;
                                             boolean formatted = false;
                                             String str; .)
    =   StringConst<out str>              (. CodeGen.writeString(str); .)
      | Expression<out expType>           (. if (!(isArith(expType) || expType == Types.boolType))
                                               SemError("cannot write this type"); .)
 *      [ ":"  Expression<out formType>   (. if (formType != Types.intType)
 *                                             SemError("fieldwidth must be integral");
 *                                           formatted = true; .)
 *      ]                                 (. switch (expType) {
                                               case Types.intType:
                                               case Types.boolType:
                                               case Types.charType:
 *                                               CodeGen.write(expType, formatted); break;
                                               default:
                                                 break;
                                             } .) .

and modifying the code generation routine

    public static void write(int type, boolean formatted) {
    // Generates code to output value of specified type from top of stack
 *    if (formatted)
 *      switch (type) {
 *        case Types.intType:  emit(PVM.prnfi);   break;
 *        case Types.boolType: emit(PVM.prnfb);   break;
 *        case Types.charType: emit(PVM.prnfc);   break;
 *      }
      else
        switch (type) {
          case Types.intType:  emit(PVM.prni);    break;
          case Types.boolType: emit(PVM.prnb);    break;
          case Types.charType: emit(PVM.prnc);    break;
        }
    }

and adding new opcodes whose interpretation is

    case PVM.prnfi:         // integer output formatted
      if (tracing) results.write(padding);
      fieldWidth = pop();
      results.write(pop(), fieldWidth);
      if (tracing) results.writeLine();
      break;
    case PVM.prnfb:         // boolean output formatted
      if (tracing) results.write(padding);
      fieldWidth = pop();
      if (pop() != 0) results.write(" true  ", fieldWidth);
      else results.write(" false ", fieldWidth);
      if (tracing) results.writeLine();
      break;
    case PVM.prnfc:         // character output formatted
      if (tracing) results.write(padding);
      fieldWidth = pop();
      results.write((char) (Math.abs(pop()) % (maxChar + 1)), fieldWidth);
      if (tracing) results.writeLine();
      break;

Task 15 - Let's untangle that heap of string

Not many people attempted this. There were two successful submissions - if a little inelegant in places, I thought

Until now, any strings encountered in a Parva program had been stacked at the top end of memory, whence they were extracted by the PRNS opcode in the PVM.

In an alternative model, one that is used in the PAM (Parva Actual Machine), the strings are stored above the program code - effectively in space that the PVM would otherwise have used for its heap. A very simple program like

    void main () {
    // A fatalistic approach to the oncoming examinations?
      writeLine("I give up");
      writeLine("Goodbye world!");
    }

would be stored in memory like this

             .-----------------------------------------.
             |                                         V
    .--------^--------------------------------------------------------------------------------------------------.
    |prns |  8  |prnl |prns | 18  |prnl |halt | nul |  I  |     |  g  |  i  |  v  |  e  |     |  u  |  p  |  0  |
    `-----------------------------------------------------------------------------------------------------------'
       0     1     2     3    | 4    5     6     7     8     9    10    11    12    13    14    15    16    17
                              |
       .----------------------'
       V
    .------------------------------------------------------------------------------------------------
    |  G  |  o  |  o  |  d  |  b  |  y  |  e  |     |  w  |  o  |  r  |  l  |  d  |  !  |  0  | . . .
    `------------------------------------------------------------------------------------------------
      18    19    20    21    22    23    24    25    26    27    28    29    30    31    33    34
                                                                                                 ^
                                                                                                 |
                                                                                              cpu.hp

It was suggested that one might associate a label with each PRNS operation, since at the time one wants to generate the opcode it is not known exactly where the string will be stored. Furthermore, to develop tight code suited to a small machine like the PAM, one should optimize the analysis and code generation so that if the same string appears in the program in two or more places, only one copy is loaded in memory.

To do this it is expedient to define a small class whose instances can store a string and a list of references (labels) to one or more incomplete PRNS operations:

 *  class StringEntry {
 *    public Label reference;
 *    public String str;
 *
 *    public StringEntry(String str) {
 *      this.reference = new Label(false);
 *      this.str = str;
 *    }
 *  } // end StringEntry

Then writeString(), the code generator invoked when a ReadElement or WriteElement is parsed is rewritten as follows:

 *  public static void writeString(String str) {
 *  // Generates code to output string stored at known location in program memory
 *    int i = 0;
 *    StringEntry entry = new StringEntry(str);
 *    while (i < stringList.size() && !str.equals(stringList.get(i).str)) i++;  // search first
 *    if (i >= stringList.size())
 *      stringList.add(entry);
 *    else
 *      entry = stringList.get(i);
 *    emit(PVM.prns); emit(entry.reference.address());
 *  }

At the end of compilation this list is scanned

    WEAK "{" { Statement<frame> }
    WEAK "}"                             (. if (debug) Table.printTable(OutFile.StdOut);
                                            CodeGen.fixDSP(DSPLabel.address(), frame.size);
                                            CodeGen.leaveProgram();
 *                                          CodeGen.fixStrings();
                                            Table.closeScope(); .)

Each forward referenced is resolved (the PRNS instructions are "backpatched"), and the corresponding string is copied, character by character, to the code array. Note that the end of the executable code is first marked with the fictitious high-numbered opcode PVM.nul to allow PVM.listCode() to distinguish code from strings (this was not mentioned in the handout and occurred to me only later!).

 *  public static void fixStrings() {
 *    emit(PVM.nul);  // mark end of real code
 *    for (StringEntry s : stringList) {
 *      backPatch(s.reference.address());  // fix one or more PRNS instructions
 *      for (int i = 0; i < s.str.length(); i++)
 *        emit(s.str.charAt(i));           // copy character by character
 *      emit(0);                           // null-terminated
 *    }
 *  }

The interpreter executes a PRNS opcode with a loop, as before, save that this time it runs "forward".

    case PVM.prns:          // string output
      if (tracing) results.write(padding);
      loop = next();
      while (ps == running && mem[loop] != 0) {
 *      results.write((char) mem[loop]); loop++;
 *      if (loop > heapBase) ps = badMem;
      }
      if (tracing) results.writeLine();
      break;

and this change is needed in the listCode() method as well:

    case PVM.prns:
      i = (i + 1) % memSize;
      j = mem[i]; codeFile.write(" \"");
      while (mem[j] != 0) {
        switch (mem[j]) {
          case '\\' : codeFile.write("\\\\"); break;
          case '\"' : codeFile.write("\\\""); break;
          case '\'' : codeFile.write("\\\'"); break;
          case '\b' : codeFile.write("\\b");  break;
          case '\t' : codeFile.write("\\t");  break;
          case '\n' : codeFile.write("\\n");  break;
          case '\f' : codeFile.write("\\f");  break;
          case '\r' : codeFile.write("\\r");  break;
          default   : codeFile.write((char) mem[j]); break;
        }
 *      j++;
      }
      codeFile.write("\"");
      break;

Task 16 - Generating tighter PVM code

The changes to the code generating routines to produce the special one-word opcodes like LDA_0 and LDC_3 are very simple, on the lines of the following:

    public static void loadConstant(int number) {
    // Generates code to push number onto evaluation stack
      switch (number) {
        case -1: emit(PVM.ldc_m1); break;
        case 0:  emit(PVM.ldc_0); break;
        case 1:  emit(PVM.ldc_1); break;
        case 2:  emit(PVM.ldc_2); break;
        case 3:  emit(PVM.ldc_3); break;
        case 4:  emit(PVM.ldc_4); break;
        case 5:  emit(PVM.ldc_5); break;
        default: emit(PVM.ldc); emit(number); break;
      }
    }

    public static void loadAddress(Entry var) {
    // Generates code to push address of variable var onto evaluation stack
      switch (var.offset) {
        case 0:  emit(PVM.lda_0); break;
        case 1:  emit(PVM.lda_1); break;
        case 2:  emit(PVM.lda_2); break;
        case 3:  emit(PVM.lda_3); break;
        case 4:  emit(PVM.lda_4); break;
        case 5:  emit(PVM.lda_5); break;
        default: emit(PVM.lda); emit(var.offset); break;
      }
    }

Of course, with the Parva grammar as it was defined for this practical one would never be in a position to generate the ldc_m1 opcode, since the grammar made no provision for negative constants. It would not have been hard to extend it to do so, and you might like to puzzle out how and where this could be done.

Just for further interest, the full solution in the solution kit allows the user to choose between "optimized" and "regular" old-style code by using a pragma $O+ or command line option -o.

Week 26 - Task 3 - Peeking and Poking in memory

In week 26 students could make use of the PAM - the Parva Actual Machine, a single board computer designed by Michael Andersen (michael@steelcode.com) that can execute PVM code directly. Like other machines in its class, the PAM has some special memory addresses that allow one to read or alter the state of various devices "located" at those addresses. From Michael's online documentation:

Hardware Abstraction Layer Space

The memory from 0x6000 to 0x600B is occupied by virtual registers that can control the hardware. These registers are:

       0x6000 : Blue LEDs
       0x6001 : Red in RGB LED
       0x6002 : Green in RGB LED
       0x6003 : Blue in RGB LED
       0x6004 : DIP Switches
       0x6005 : Push buttons (ABC)
       0x6006 : LDR value (not implemented)
       0x6007 : Temperature sensor (not implemented)
       0x6008 : Delay X jiffies (a jiffy is about 10ms)
       0x6009 : Interrupt vector for button A
       0x6010 : Interrupt vector for button B
       0x6011 : Interrupt vector for button C

They can only be written to using a 'sto' instruction and can only be read from using an 'ldv' instruction.

Here is an example of a Parva program that reads the state of the switches and reflects them on the LEDs

    void main() { $C+
    // Read the dip switches and display the readings on the leds of the PAM
    // Clearly this does not work on the PVM

      const DIP = 06004H;
      const LED = 06000H;

      while (peek(DIP) != 0)
        poke(LED, peek(DIP));
    }

This code has assumed the existence of a new possibility for a Factor - peek(address) is to push the word found at the absolute address in memory onto the evaluation stack of the PAM/PVM, while poke(address, value) is a new variation on Statement that will pop the stack and store this value at the absolute address specified as the second argument.

Adding these extensions to Parva should have been no real challenge for students who had got this far. The production for PokeStatement is as follows:

    PokeStatement                         (. int type, address;
                                             ConstRec con; .)
    =  "poke" "(" Expression<out type>    (. if (!isArith(type))
                                               SemError("integer address needed"); .)
        WEAK "," Expression<out type>     (. if (!isArith(type))
                                               SemError("integer expression needed");
                                             CodeGen.assign(type); .)
       ")" WEAK ";" .

and the extra possibility within Factor is as follows:

       | "peek"
          "(" Expression<out type>        (. if (!isArith(type))
                                               SemError("Arithmetic argument needed");
                                             CodeGen.dereference();
                                             type = Types.intType; .)

Postscript

Allowing users of the compiler to choose between interpreting the PVM code or downloading it through a USB cable to a PAM board was easily achieved by modifying the Parva.frame file (and linking in the extra classes needed).

At the time of writing the software for communicating with the PAM board was available only for the Java version of Parva. It is hoped that by next year we shall have it operational on the C# version as well.

    do {
      System.err.print("\n\nI)nterpret R)un or Q)uit ");
      reply = (InFile.StdIn.readLine() + " ").toUpperCase().charAt(0);
      if (reply == 'I')
        PVM.interpret(codeLength, initSP);
      else if (reply == 'R') {
        if (!loaded) {
          NativeLink.initNatives();                            // Initialise the libraries we are using

          ArrayList<Integer> codes = new ArrayList<Integer>(); // Build up the list of codes
          for (int i = 0; i < codeLength; i++)
            codes.add(PVM.mem[i]);

          ProgramImage pi = new ProgramImage();                // Copy the codes to the PAM
          pi.writeCodes(0, codes);
          pi.programToDevice();

          System.err.println("Loaded");
          loaded = true;                                       // So we can rerun without reloading
        }

        InteractiveConsole ic = new InteractiveConsole();
        System.err.println("Running");
        ic.startBlocking();
      } // reply = 'R'
    } while (reply != 'Q');