There were some very good solutions submitted, and some very energetic ones too - clearly a lot of students had put in many hours developing their code. This is very encouraging. Do learn to put your names into the introductory comments of programs that you write.
Full source for the solutions summarized here can be found in the ZIP file on the servers - PRAC20A.ZIP. These sources also include the extensions needed to handle character I/O and storage, as explored in the prac test.
At the heart of this lay a search loop:
read("Search for (99 stops)? ", item); // first search item while (item != 99) { i = 1; while (list[i] != item) i = i + 1;
It's an awful program because the array subscript goes out of bounds if the item cannot be located.
In the better version of the program in the file SEARCH1.PAV was code that read
list[0] = item; // post as a sentinel i = 10; while (list[i] != item) i = i - 1; // must terminate!
This is much better because it uses a sentinel technique for the linear search. However, it can still be broken if you give it data that is out of range of the int type, give it non-numeric data, or supply 99 as one of the initial list of 10 numbers.
Task 5 was to hand-compile the Factorial program into PVM code. Most people got a long way towards this. Have a look at how I have commented this, using "high level" code, rather than detailed line by line commentary of the form "load address of X". Most of the submissions had "commentary" that was, frankly, almost useless. Try the following test for assembler code: Cover over the real code with a piece of paper and read only the comments. Does what you read make sense on its own? I maintain that it should. The easiest way to do this is by using a high level algorithmic notation.
0 DSP 3 ; n is v0, f is v1, i is v2 42 MUL 2 LDA 0 43 STO 4 LDC 1 44 LDA 2 ; f = f * i; 6 STO ; n = 1; 46 LDA 2 7 LDA 0 48 LDV 9 LDV 49 LDC 1 10 LDC 20 ; // max = 20, constant 51 SUB 12 CLE ; while (n <= max) { 52 STO ; i = i - 1; 13 BZE 78 53 BRN 26 ; } 15 LDA 1 55 LDA 0 17 LDC 1 57 LDV 19 STO ; f = 1; 58 PRNI ; write(n); 20 LDA 2 59 PRNS "! = " ; write("! = "); 22 LDA 0 61 LDA 1 24 LDV 63 LDV 25 STO ; i = n; 64 PRNI ; write(f); 26 LDA 2 65 PRNS "\n" ; write("\n") (or use PRNL) 28 LDV 67 LDA 0 29 LDC 0 69 LDA 0 31 CGT ; while (i > 0) { 71 LDV 32 BZE 55 72 LDC 1 34 LDA 1 74 ADD 36 LDA 1 75 STO ; n = n + 1; 38 LDV 76 BRN 7 ; } 39 LDA 2 78 HALT 41 LDV
Checking for overflow in multiplication and division was not well done. You cannot multiply and then try to check overflow (it is too late by then) - you have to detect it in a more subtle way. Here is one way of doing it - note the check to prevent a division by zero. This does not use any precision greater than that of the simulated machine itself. Note that it is necessary to check for "division by zero" in the rem code as well!
case PVM.mul: // integer multiplication tos = pop(); sos = pop(); if (tos != 0 && Math.abs(sos) > maxInt / Math.abs(tos)) ps = badVal; else push(sos * tos); break; case PVM.div: // integer division (quotient) tos = pop(); if (tos == 0) ps = divZero; else push(pop() / tos); break; case PVM.rem: // integer division (remainder) tos = pop(); if (tos == 0) ps = divZero; else push(pop() % tos); break;
It is possible to use an intermediate long variable (but don't forget the casting operations or the abs function):
case PVM.mul: // integer multiplication tos = pop(); sos = pop(); long temp = (long) sos * (long) tos; if (Math.abs(temp) > maxInt) ps = badVal; else push(sos * tos); break;
This is straightforward, if a little tedious, and it is easy to leave some of the changes out and get a corrupted solution. The PVMAsm class requires modification in the switch statement that recognizes two-word opcodes:
case PVM.brn: // all require numeric address field ... case PVM.ldc: case PVM.ldl: // +++++++++++++++++ addition case PVM.stl: // +++++++++++++++++ addition codeLen = (codeLen + 1) % PVM.memSize; if (ch == '\n') // no field could be found error("Missing address", codeLen); else { // unpack it and store PVM.mem[codeLen] = src.readInt(); if (src.error()) error("Bad address", codeLen); } break;
The PVM class requires several additions. We must add to the enumeration of the machine opcodes:
public static final int // Machine opcodes ... ldl = 63, // +++++++++++++++++ additions stl = 64, lda_0 = 65, ...
We must add to the switch statement in the trace method (several submissions missed this):
static void trace(OutFile results, int pcNow) { switch (cpu.ir) { ... case PVM.ldl: // +++++++++++++++++ addition case PVM.stl: // +++++++++++++++++ addition } results.writeLine(); }
and we must provide case arms for all the new opcodes. A selection of these follows; the rest can be seen in the solution kit. Notice that for consistency all the "inBounds" checks should be performed on the new opcodes too (several submissions missed this).
case PVM.ldc_m1: // push constant -1 push(-1); break; case PVM.ldc_0: // push constant 0 push(0); break; case PVM.ldc_1: // push constant 1 push(1); break; ... case PVM.lda_0: // push local address 0 adr = cpu.fp - 1; if (inBounds(adr)) push(adr); break; case PVM.lda_1: // push local address 1 adr = cpu.fp - 2; if (inBounds(adr)) push(adr); break; ... case PVM.ldl: // push local value adr = cpu.fp - 1 - next(); if (inBounds(adr)) push(mem[adr]); break; case PVM.ldl_0: // push value of local variable 0 adr = cpu.fp - 1; if (inBounds(adr)) push(mem[adr]); break; case PVM.ldl_1: // push value of local variable 1 adr = cpu.fp - 2; if (inBounds(adr)) push(mem[adr]); break; ... case PVM.stl: // store local value adr = cpu.fp - 1 - next(); if (inBounds(adr)) mem[adr] = pop(); break; case PVM.stl_0: // pop to local variable 0 adr = cpu.fp - 1; if (inBounds(adr)) mem[adr] = pop(); break; case PVM.stl_1: // pop to local variable 1 adr = cpu.fp - 2; if (inBounds(adr)) mem[adr] = pop(); break;
We must add to the method that lists out the code (several submissions missed this). :
public static void listCode(String fileName, int codeLen) { ... case PVM.brn: case PVM.ldc: case PVM.ldl: // +++++++++++++++++ addition case PVM.stl: // +++++++++++++++++ addition i = (i + 1) % memSize; codeFile.write(mem[i]); break;
Finally we must add to the section that initializes the mnemonic lookup table:
public static void init() { ... mnemonics[PVM.ldl] = "LDL"; // +++++++++++++++++ additions mnemonics[PVM.stl] = "STL"; mnemonics[PVM.lda_0] = "LDA_0"; ...
As an example of using the new opcodes, here is the Factorial program recoded in considerably fewer operations. Many submissions only used some of the new opcodes, ignoring the STL ones, for example.
0 DSP 3 ; n is v0, f is v1, i is v3 22 STL_1 ; f = f * i; 2 LDC_1 23 LDL_2 3 STL_0 ; n = 1; 24 LDC_1 4 LDL_0 25 SUB 5 LDC 20 ; // max = 20, constant 26 STL_2 ; i = i = 1; 7 CLE ; while (n <= max) { 27 BRN 14 ; } 8 BZE 43 29 LDL_0 10 LDC_1 30 PRNI ; write(n); 11 STL_1 ; f = 1; 31 PRNS "! = " ; write("! = "); 12 LDL_0 33 LDL_1 13 STL_2 ; i = n; 34 PRNI ; write(f); 14 LDL_2 35 PRNS "\n" ; write("\n") (or use PRNL) 15 LDC_0 37 LDL_0 16 CGT ; while (i > 0) { 38 LDC_1 17 BZE 29 39 ADD 19 LDL_1 40 STL_0 ; n = n + 1; 20 LDL_2 41 BRN 4 : } 21 MUL 43 HALT
Surprisingly, no. In the prac worksheet the suggestion was made that you study the original source to see that the original opcodes had been mapped onto the numbers 30 .. 62. This meant that you could map the new opcodes onto a set of numbers below 30, or above 62. In the prac solution kit you will find four versions of the interpreter in which this has been done.
The following table shows various timings obtained on the four systems for two encodings of the infamous Sieve of Eratosthenes, differing only in that one used the compact opcodes where possible. The behaviour is quite remarkable. The optimized opcode set resulted in the execution of about 30% fewer instructions over counts running into millions, and when the optimized opcodes were mapped onto "high" internal numbers the overall execution speed improved to about 87%. However, when mapped onto low numbers the code using the unoptimized opcode set took far longer to run, while that using the optimized opcode set slightly less time to run. Since the only difference in the source code of the interpreter was to be found in this numerical mapping, one is forced to conclude that the underlying implementation of the large switch statement plays a key role in the performance one can expect. Several submissions suggested that the differences could be explained away by the longer list of opcodes and the (relatively) slow lookup process that forms the basis of the opCode method in the PVM.java file (at least, that is what I think the authors were trying to say; some explanations were very badly expressed!). But this has nothing to do with it - that method is used by the assembly process when the source code is read in, and not at all by the interpretation/execution process when the program is "run".
In a really serious implementation of an interpreter it would be worth carrying out further experiments to determine the optimal mapping, based, for example, on benchmarks carried out on a variety of programs. (These timings were done fairly roughly on a stopwatch; one should really have run the simulations many times over and for higher numbers of iterations, but the effects show up readily enough.)
Only one team came up with any suggestions for how the interpreter could be improved still further. This can be done in varios ways, for example by "inlining" the code that is currently executed by calls to the next, push and pop routines, and it was disappointing that nobody bothered to try this. Of course it means quite a lot of changes have to be made. The solutions kits show this in detail.
1000 iterations, 1000 upper limit, times in seconds (Win XP, 3GHz machine) S1.PVM S2.PVM Opcode set Original Optimized High numbers 3.47 3.04 (88%) High numbers, checks removed 1.98 2.18 (110%) Low numbers 4.71 2.99 (63%) Low numbers, checks removed 3.32 2.00 (60%) Operations 92,442,039 64,889,031 (70%) 200 iterations, 4000 upper limit, times in seconds (Win XP, 3GHz machine) S1.PVM S2.PVM Opcode set Original Optimized High numbers 2.97 2.54 (86%) High numbers, checks removed 1.69 1.84 (109%) Low numbers 3.88 2.47 (64%) Low numbers, checks removed 2.78 1.67 (60%) Operations 77,121,239 54,092,231 (70%)
The "checks removed" figures were obtained using variations of the interpreter source in which all the checks that CPU.SP remained in bounds had been suppressed, as well as the calls to next, push and pop (their effect was achieved by "inlining" the equivalent code. One can see that an insistence on safety results in a considerable loss of run-time efficiency.
I ran the simulations again using C# implementations of the system - the source code is to all intents and purposes identical:
1000 iterations, 1000 upper limit, times in seconds (Win XP, 3GHz machine) S1.PVM S2.PVM Opcode set Original Optimized High numbers 2.85 2.06 (72%) High numbers, checks removed 2.03 0.95 (49%) Low numbers 2.80 1.97 (71%) Low numbers, checks removed 1.74 0.83 (48%) Operations 92,442,039 64,889,031 (70%) 200 iterations, 4000 upper limit, times in seconds (Win XP, 3GHz machine) S1.PVM S2.PVM Opcode set Original Optimized High numbers 2.35 1.79 (76%) High numbers, checks removed 1.70 0.81 (48%) Low numbers 2.37 1.67 (70%) Low numbers, checks removed 1.50 0.68 (45%) Operations 77,121,239 54,092,231 (70%)
Interestingly, the C# system is "faster" than the Java one, and there is less variation in timing between the "high" and "low" number mappings of the opcodes.
This example aimed to demonstrate the use of the Boolean opcodes. Here is a solution, also making use of the new opcodes (a solution using the original opcodes would have been acceptable, of course)
0 DSP 3 ; v0 is x, v1 is y, v2 is z 23 PRNS "\n" ; write("\n"); 2 PRNS " X Y Z X OR !Y AND Z\n" 25 LDL_2 4 LDC_0 26 NOT 5 STL_0 ; x = false; 27 STL_2 ; Z = ! Z; 6 LDC_0 ; repeat 28 LDL_2 7 STL_1 ; y = false; 29 NOT 8 LDC_0 repeat 30 BZE 10 ; until !Z; 9 STL_2 ; z = false; 32 LDL_1 10 LDL_0 ; repeat 33 NOT 11 PRNB ; write(x); 34 STL_1 ; Y = ! Y; 12 LDL_1 35 LDL_1 13 PRNB ; write(y); 36 NOT 14 LDL_2 37 BZE 8 ; until !Y; 15 PRNB ; write(z); 39 LDL_0 16 LDL_0 40 NOT 17 LDL_1 41 STL_0 ; X = !X; 18 NOT ; (not y) 42 LDL_0 19 LDL_2 43 NOT 20 AND ; (not y and z) 44 BZE 6 ; until !X; 21 OR ; x or (not y and z) 46 HALT 22 PRNB ; write(x || !Y && Z);
A translation of SEARCH1.PAV using the new opcodes follows. One using the original opcodes appears in the solution kit.
0 DSP 3 ; v0 is item, v1 is i, v2 is list 42 LDL_2 2 LDC 11 43 LDL_1 4 ANEW 44 LDXA 5 STL_2 ; int[] list = new int[11] 45 LDV 6 LDC_1 46 LDL_0 7 STL_1 ; i = 1; 47 CNE ; while (list[i] != item) { 8 LDL_1 48 BZE 56 9 LDC 10 50 LDL_1 11 CLE ; while (i <= 10) { 51 LDC_1 12 BZE 24 52 SUB 14 LDL_2 53 STL_1 ; i = i = 1; 15 LDL_1 54 BRN 42 ; } 16 LDXA 56 LDL_1 17 INPI ; read(list[i]); 57 LDC_0 18 LDL_1 58 CEQ ; if (i == 0) 19 LDC_1 59 BZE 63 20 ADD 61 PRNS "Not found\n" 21 STL_1 ; i = i + 1; 63 LDL_1 22 BRN 8 ; } 64 LDC_0 24 PRNS "Search for (99 stops)? " 65 CNE ; if (i != 0) { 26 LDA_0 66 BZE 74 27 INPI ; read(item) 68 PRNS "Found in position" 28 LDL_0 70 LDL_1 29 LDC 99 71 PRNI ; write(i); 31 CNE ; while (item != 99) { 72 PRNS "\n" ; write("\n"); 32 BZE 78 74 LDA_0 ; } 34 LDL_2 75 INPI ; read(item); 35 LDC_0 76 BRN 28 ; } 36 LDXA 78 HALT 37 LDL_0 38 STO ; list[0] = item; 39 LDC 10 41 STL_1 ; i = 10;
As always, a shorter solution still could have been found. For example, the sequence LDC_0 LDXA at 35-36 could have been omitted (can you see why?).