Solutions for Practical 20

Computer Science 3 - 2005

Programming Language Translation

Practical for Week 20, beginning 5 September 2005 - Solutions

There were some very good solutions submitted, and some very energetic ones too - clearly a lot of students had put in many hours developing their code. This is very encouraging. Do learn to put your names into the introductory comments of programs that you write.

Full source for the solutions summarized here can be found in the ZIP file on the servers - PRAC20A.ZIP. These sources also include the extensions needed to handle character I/O and storage, as explored in the prac test.

Task 2

At the heart of this lay a search loop:

    read("Search for (99 stops)? ", item);  // first search item
    while (item != 99) {
      i = 1;
      while (list[i] != item)  i = i + 1;

It's an awful program because the array subscript goes out of bounds if the item cannot be located.

Task 3

In the better version of the program in the file SEARCH1.PAV was code that read

    list[0] = item;                       // post as a sentinel
    i = 10;
    while (list[i] != item) i = i - 1;    // must terminate!

This is much better because it uses a sentinel technique for the linear search. However, it can still be broken if you give it data that is out of range of the int type, give it non-numeric data, or supply 99 as one of the initial list of 10 numbers.

Task 5

Task 5 was to hand-compile the Factorial program into PVM code. Most people got a long way towards this. Have a look at how I have commented this, using "high level" code, rather than detailed line by line commentary of the form "load address of X". Most of the submissions had "commentary" that was, frankly, almost useless. Try the following test for assembler code: Cover over the real code with a piece of paper and read only the comments. Does what you read make sense on its own? I maintain that it should. The easiest way to do this is by using a high level algorithmic notation.

   0 DSP    3       ; n is v0, f is v1, i is v2      42 MUL
   2 LDA    0                                        43 STO
   4 LDC    1                                        44 LDA    2       ;     f = f * i;
   6 STO            ; n = 1;                         46 LDA    2
   7 LDA    0                                        48 LDV
   9 LDV                                             49 LDC    1
  10 LDC    20      ; // max = 20, constant          51 SUB
  12 CLE            ; while (n <= max) {             52 STO            ;     i = i - 1;
  13 BZE    78                                       53 BRN    26      ;   }
  15 LDA    1                                        55 LDA    0
  17 LDC    1                                        57 LDV
  19 STO            ;   f = 1;                       58 PRNI           ;   write(n);
  20 LDA    2                                        59 PRNS   "! = "  ;   write("! = ");
  22 LDA    0                                        61 LDA    1
  24 LDV                                             63 LDV
  25 STO            ;   i = n;                       64 PRNI           ;   write(f);
  26 LDA    2                                        65 PRNS   "\n"    ;   write("\n") (or use PRNL)
  28 LDV                                             67 LDA    0
  29 LDC    0                                        69 LDA    0
  31 CGT            ;   while (i > 0) {              71 LDV
  32 BZE    55                                       72 LDC    1
  34 LDA    1                                        74 ADD
  36 LDA    1                                        75 STO            ;   n = n + 1;
  38 LDV                                             76 BRN    7       ; }
  39 LDA    2                                        78 HALT
  41 LDV

Task 6 - Trapping overflow

Checking for overflow in multiplication and division was not well done. You cannot multiply and then try to check overflow (it is too late by then) - you have to detect it in a more subtle way. Here is one way of doing it - note the check to prevent a division by zero. This does not use any precision greater than that of the simulated machine itself. Note that it is necessary to check for "division by zero" in the rem code as well!

    case PVM.mul:           // integer multiplication
      tos = pop();
      sos = pop();
      if (tos != 0 && Math.abs(sos) > maxInt / Math.abs(tos)) ps = badVal;
      else push(sos * tos);
      break;

    case PVM.div:           // integer division (quotient)
      tos = pop();
      if (tos == 0) ps = divZero;
      else push(pop() / tos);
      break;
    case PVM.rem:           // integer division (remainder)
      tos = pop();
      if (tos == 0) ps = divZero;
      else push(pop() % tos);
      break;

It is possible to use an intermediate long variable (but don't forget the casting operations or the abs function):

    case PVM.mul:           // integer multiplication
      tos = pop();
      sos = pop();
      long temp = (long) sos * (long) tos;
      if (Math.abs(temp) > maxInt) ps = badVal;
      else push(sos * tos);
      break;

Task 7 - Improving the opcode set

This is straightforward, if a little tedious, and it is easy to leave some of the changes out and get a corrupted solution. The PVMAsm class requires modification in the switch statement that recognizes two-word opcodes:

    case PVM.brn:                            // all require numeric address field
    ...
    case PVM.ldc:
    case PVM.ldl:  // +++++++++++++++++  addition
    case PVM.stl:  // +++++++++++++++++  addition
      codeLen = (codeLen + 1) % PVM.memSize;
      if (ch == '\n')                        // no field could be found
        error("Missing address", codeLen);
      else {                                 // unpack it and store
        PVM.mem[codeLen] = src.readInt();
        if (src.error()) error("Bad address", codeLen);
      }
      break;

The PVM class requires several additions. We must add to the enumeration of the machine opcodes:

    public static final int // Machine opcodes
      ...
      ldl    = 63,     // +++++++++++++++++  additions
      stl    = 64,
      lda_0  = 65,
      ...

We must add to the switch statement in the trace method (several submissions missed this):

    static void trace(OutFile results, int pcNow) {
      switch (cpu.ir) {
        ...
        case PVM.ldl:  // +++++++++++++++++  addition
        case PVM.stl:  // +++++++++++++++++  addition
      }
      results.writeLine();
    }

and we must provide case arms for all the new opcodes. A selection of these follows; the rest can be seen in the solution kit. Notice that for consistency all the "inBounds" checks should be performed on the new opcodes too (several submissions missed this).

       case PVM.ldc_m1:        // push constant -1
         push(-1);
         break;
       case PVM.ldc_0:         // push constant 0
         push(0);
         break;
       case PVM.ldc_1:         // push constant 1
         push(1);
         break;
       ...

       case PVM.lda_0:         // push local address 0
         adr = cpu.fp - 1;
         if (inBounds(adr)) push(adr);
         break;
       case PVM.lda_1:         // push local address 1
         adr = cpu.fp - 2;
         if (inBounds(adr)) push(adr);
         break;
       ...

       case PVM.ldl:           // push local value
         adr = cpu.fp - 1 - next();
         if (inBounds(adr)) push(mem[adr]);
         break;
       case PVM.ldl_0:         // push value of local variable 0
         adr = cpu.fp - 1;
         if (inBounds(adr)) push(mem[adr]);
         break;
       case PVM.ldl_1:         // push value of local variable 1
         adr = cpu.fp - 2;
         if (inBounds(adr)) push(mem[adr]);
         break;
       ...

       case PVM.stl:           // store local value
         adr = cpu.fp - 1 - next();
         if (inBounds(adr)) mem[adr] = pop();
         break;
       case PVM.stl_0:         // pop to local variable 0
         adr = cpu.fp - 1;
         if (inBounds(adr)) mem[adr] = pop();
         break;
       case PVM.stl_1:         // pop to local variable 1
         adr = cpu.fp - 2;
         if (inBounds(adr)) mem[adr] = pop();
         break;

We must add to the method that lists out the code (several submissions missed this). :

    public static void listCode(String fileName, int codeLen) {
        ...
       case PVM.brn:
       case PVM.ldc:
       case PVM.ldl: // +++++++++++++++++  addition
       case PVM.stl: // +++++++++++++++++  addition
         i = (i + 1) % memSize; codeFile.write(mem[i]);
         break;

Finally we must add to the section that initializes the mnemonic lookup table:

    public static void init() {
      ...
      mnemonics[PVM.ldl]    = "LDL";    // +++++++++++++++++  additions
      mnemonics[PVM.stl]    = "STL";
      mnemonics[PVM.lda_0]  = "LDA_0";
      ...

As an example of using the new opcodes, here is the Factorial program recoded in considerably fewer operations. Many submissions only used some of the new opcodes, ignoring the STL ones, for example.

   0 DSP    3       ; n is v0, f is v1, i is v3       22 STL_1          ;     f = f * i;
   2 LDC_1                                            23 LDL_2
   3 STL_0          ; n = 1;                          24 LDC_1
   4 LDL_0                                            25 SUB
   5 LDC    20      ; // max = 20, constant           26 STL_2          ;     i = i = 1;
   7 CLE            ; while (n <= max) {              27 BRN    14      ;   }
   8 BZE    43                                        29 LDL_0
  10 LDC_1                                            30 PRNI           ;   write(n);
  11 STL_1          ;   f = 1;                        31 PRNS   "! = "  ;   write("! = ");
  12 LDL_0                                            33 LDL_1
  13 STL_2          ;   i = n;                        34 PRNI           ;   write(f);
  14 LDL_2                                            35 PRNS   "\n"    ;   write("\n") (or use PRNL)
  15 LDC_0                                            37 LDL_0
  16 CGT            ;   while (i > 0) {               38 LDC_1
  17 BZE    29                                        39 ADD
  19 LDL_1                                            40 STL_0          ;   n = n + 1;
  20 LDL_2                                            41 BRN    4       : }
  21 MUL                                              43 HALT

Task 8 - Do "improvements" necessarily make things "better"?

Surprisingly, no. In the prac worksheet the suggestion was made that you study the original source to see that the original opcodes had been mapped onto the numbers 30 .. 62. This meant that you could map the new opcodes onto a set of numbers below 30, or above 62. In the prac solution kit you will find four versions of the interpreter in which this has been done.

The following table shows various timings obtained on the four systems for two encodings of the infamous Sieve of Eratosthenes, differing only in that one used the compact opcodes where possible. The behaviour is quite remarkable. The optimized opcode set resulted in the execution of about 30% fewer instructions over counts running into millions, and when the optimized opcodes were mapped onto "high" internal numbers the overall execution speed improved to about 87%. However, when mapped onto low numbers the code using the unoptimized opcode set took far longer to run, while that using the optimized opcode set slightly less time to run. Since the only difference in the source code of the interpreter was to be found in this numerical mapping, one is forced to conclude that the underlying implementation of the large switch statement plays a key role in the performance one can expect. Several submissions suggested that the differences could be explained away by the longer list of opcodes and the (relatively) slow lookup process that forms the basis of the opCode method in the PVM.java file (at least, that is what I think the authors were trying to say; some explanations were very badly expressed!). But this has nothing to do with it - that method is used by the assembly process when the source code is read in, and not at all by the interpretation/execution process when the program is "run".

In a really serious implementation of an interpreter it would be worth carrying out further experiments to determine the optimal mapping, based, for example, on benchmarks carried out on a variety of programs. (These timings were done fairly roughly on a stopwatch; one should really have run the simulations many times over and for higher numbers of iterations, but the effects show up readily enough.)

Only one team came up with any suggestions for how the interpreter could be improved still further. This can be done in varios ways, for example by "inlining" the code that is currently executed by calls to the next, push and pop routines, and it was disappointing that nobody bothered to try this. Of course it means quite a lot of changes have to be made. The solutions kits show this in detail.

   1000 iterations, 1000 upper limit, times in seconds (Win XP, 3GHz machine)

                                  S1.PVM        S2.PVM

   Opcode set                     Original      Optimized
   High numbers                     3.47          3.04     (88%)
   High numbers, checks removed     1.98          2.18     (110%)
   Low numbers                      4.71          2.99     (63%)
   Low numbers, checks removed      3.32          2.00     (60%)
   Operations                    92,442,039    64,889,031  (70%)

   200 iterations, 4000 upper limit, times in seconds (Win XP, 3GHz machine)

                                  S1.PVM        S2.PVM

   Opcode set                    Original      Optimized
   High numbers                     2.97          2.54     (86%)
   High numbers, checks removed     1.69          1.84     (109%)
   Low numbers                      3.88          2.47     (64%)
   Low numbers, checks removed      2.78          1.67     (60%)
   Operations                    77,121,239    54,092,231  (70%)

The "checks removed" figures were obtained using variations of the interpreter source in which all the checks that CPU.SP remained in bounds had been suppressed, as well as the calls to next, push and pop (their effect was achieved by "inlining" the equivalent code. One can see that an insistence on safety results in a considerable loss of run-time efficiency.

I ran the simulations again using C# implementations of the system - the source code is to all intents and purposes identical:

   1000 iterations, 1000 upper limit, times in seconds (Win XP, 3GHz machine)

                                  S1.PVM        S2.PVM

   Opcode set                     Original      Optimized
   High numbers                     2.85          2.06     (72%)
   High numbers, checks removed     2.03          0.95     (49%)
   Low numbers                      2.80          1.97     (71%)
   Low numbers, checks removed      1.74          0.83     (48%)
   Operations                    92,442,039    64,889,031  (70%)

   200 iterations, 4000 upper limit, times in seconds (Win XP, 3GHz machine)

                                  S1.PVM        S2.PVM

   Opcode set                    Original      Optimized
   High numbers                     2.35          1.79     (76%)
   High numbers, checks removed     1.70          0.81     (48%)
   Low numbers                      2.37          1.67     (70%)
   Low numbers, checks removed      1.50          0.68     (45%)
   Operations                    77,121,239    54,092,231  (70%)

Interestingly, the C# system is "faster" than the Java one, and there is less variation in timing between the "high" and "low" number mappings of the opcodes.

Task 9 - Nothing like practice to make things perfect

This example aimed to demonstrate the use of the Boolean opcodes. Here is a solution, also making use of the new opcodes (a solution using the original opcodes would have been acceptable, of course)

  0 DSP    3      ; v0 is x, v1 is y, v2 is z        23 PRNS   "\n"   ;       write("\n");
  2 PRNS   "   X     Y     Z     X OR !Y AND Z\n"    25 LDL_2
  4 LDC_0                                            26 NOT
  5 STL_0         ; x = false;                       27 STL_2         ;       Z = ! Z;
  6 LDC_0         ; repeat                           28 LDL_2
  7 STL_1         ;   y = false;                     29 NOT
  8 LDC_0             repeat                         30 BZE    10     ;     until !Z;
  9 STL_2         ;     z = false;                   32 LDL_1
 10 LDL_0         ;     repeat                       33 NOT
 11 PRNB          ;       write(x);                  34 STL_1         ;     Y = ! Y;
 12 LDL_1                                            35 LDL_1
 13 PRNB          ;       write(y);                  36 NOT
 14 LDL_2                                            37 BZE    8      ;   until !Y;
 15 PRNB          ;       write(z);                  39 LDL_0
 16 LDL_0                                            40 NOT
 17 LDL_1                                            41 STL_0         ;   X = !X;
 18 NOT           ;       (not y)                    42 LDL_0
 19 LDL_2                                            43 NOT
 20 AND           ;       (not y and z)              44 BZE    6      ; until !X;
 21 OR            ;       x or (not y and z)         46 HALT
 22 PRNB          ;       write(x || !Y && Z);

Task 10 - Arrays in the PVM

A translation of SEARCH1.PAV using the new opcodes follows. One using the original opcodes appears in the solution kit.

  0 DSP    3      ; v0 is item, v1 is i, v2 is list  42 LDL_2
  2 LDC    11                                        43 LDL_1
  4 ANEW                                             44 LDXA
  5 STL_2         ; int[] list = new int[11]         45 LDV
  6 LDC_1                                            46 LDL_0
  7 STL_1         ; i = 1;                           47 CNE           ;   while (list[i] != item) {
  8 LDL_1                                            48 BZE    56
  9 LDC    10                                        50 LDL_1
 11 CLE           ; while (i <= 10) {                51 LDC_1
 12 BZE    24                                        52 SUB
 14 LDL_2                                            53 STL_1         ;     i = i = 1;
 15 LDL_1                                            54 BRN    42     ;   }
 16 LDXA                                             56 LDL_1
 17 INPI          ;   read(list[i]);                 57 LDC_0
 18 LDL_1                                            58 CEQ           ;   if (i == 0)
 19 LDC_1                                            59 BZE    63
 20 ADD                                              61 PRNS   "Not found\n"
 21 STL_1         ;   i = i + 1;                     63 LDL_1
 22 BRN    8      ; }                                64 LDC_0
 24 PRNS   "Search for (99 stops)? "                 65 CNE           ;   if (i != 0) {
 26 LDA_0                                            66 BZE    74
 27 INPI          ; read(item)                       68 PRNS   "Found in position"
 28 LDL_0                                            70 LDL_1
 29 LDC    99                                        71 PRNI          ;     write(i);
 31 CNE           ; while (item != 99) {             72 PRNS   "\n"   ;     write("\n");
 32 BZE    78                                        74 LDA_0         ;   }
 34 LDL_2                                            75 INPI          ;   read(item);
 35 LDC_0                                            76 BRN    28     ; }
 36 LDXA                                             78 HALT
 37 LDL_0
 38 STO           ;   list[0] = item;
 39 LDC    10
 41 STL_1         ;   i = 10;

As always, a shorter solution still could have been found. For example, the sequence LDC_0 LDXA at 35-36 could have been omitted (can you see why?).