## Computer Science 3 - 2017 # **Programming Language Translation** ## Practical for Week 2, beginning 24 July 2017 - Solutions There were some very good solutions submitted, and some energetic ones too - clearly a lot of students had put in many hours developing their code. This is very encouraging, but there was also evidence of "sharing" out the tasks, not really working together a proper group, and not developing an interpreter that was up to the later tasks. And do learn to put your names into the introductory comments of programs that you write. Full source for the solutions summarized here can be found in the ZIP file on the servers - PRAC2A.ZIP Task 3 involved reading some Parva code for a simple algorithm and then adding suitable commentary. It is highly recommended that you adopt the style shown below, where the higher level code acts as commentary, rather than adopting a line by line explanation of each mnemonic/opcode. ``` ; Demonstrate de Morgan's Laws ; P.D. Terry, Rhodes University, 2017 0 DSP ; bool x is v0, y is v1 (X.Y)\' X\'+Y\' (X+Y)\' X\'.Y\'\n\n" 2 PRNS 4 LDA 0 47 OR 48 NOT 6 LDC 0 write(!(x | y)); 8 STO x = false: 49 PRNI 0 9 LDA 1 repeat 50 LDA 11 LDC 52 LDV 13 STO y = false; 53 NOT 14 LDA 0 54 LDA 1 repeat 56 LDV 16 I DV 17 PRNI write(x); 57 NOT 18 LDA 58 AND 20 LDV 59 PRNI write(!x && !y); "\n" 21 PRNI write(y); 60 PRNS writeLine(); 0 22 LDA 62 LDA 24 LDV 64 LDA 25 LDA 1 66 LDV 27 LDV 67 NOT 68 STO 28 AND y = !y; 69 LDA 29 NOT 1 30 PRNI write(!(x && y)); 71 LDV 72 NOT 31 LDA 33 LDV 73 BZE 14 until (!y); 75 LDA 34 NOT 0 n 35 LDA 1 77 LDA 37 LDV 79 LDV 38 NOT 80 NOT 39 OR 81 STO x = !x; 40 PRNI write(!x | !y); 0 82 LDA 41 LDA 0 84 LDV 43 LDV 85 NOT 44 LDA 86 BZE until (!x); 46 LDV 88 HALT ; System Exit(); ``` It is easy to see that this does not use short circuit evaluation of Boolean expressions, as it uses AND and OR, which are infix operators that requires their two operands both to have been evaluated and pushed onto the expression stack. However, it is easy to eliminate the AND and OR by introducing "jumping code" as it is sometimes called. We rely on the idea that for short-circuit semantics to hold we can write the following logical identities: ``` x AND y \equiv if x then y else false x OR y \equiv if x then true else y ... ``` If we apply them to an analysis of various Boolean expressions in the algorithm we could also use ``` !x AND !y = if !x then !y else false !x OR !y = if !x then true else !y ... ``` Admittedly this has quite a lot more code than the binary operator code in the original. However, short-circuited Boolean evaluation is so much better that it is worth developing special opcodes to achieve it, as we shall see later in the course. ``` ; Demonstrate de Morgan's Laws P.D. Terry, Rhodes University, 2017 using short-circuit semantics bool x is v0, y is v1 Y (X.Y)\' X\'+Y\' 0 DSP (X+Y)\' X\'.Y\'\n\n" 2 PRNS Χ 0 4 LDA 6 LDC 0 58 BRN 8 STO x = false; 60 LDA 9 LDA 1 62 LDV repeat 63 NOT 11 LDC 0 write(!(x | y)); y = false; 64 PRNI 13 STO 0 14 LDA repeat 65 LDA 0 16 LDV 67 LDV 17 PRNI 68 BZE write(x); 74 18 LDA 70 LDC 0 1 20 LDV 72 BRN 78 21 PRNI write(y); 74 LDA 22 LDA 76 LDV 24 LDV 77 NOT write(!x && !y); 25 BZE 32 78 PRNI "\n" 79 PRNS writeLine(); 27 LDA 29 LDV 81 LDA 1 30 BRN 34 83 LDA 32 LDC 85 LDV 86 NOT 34 NOT 35 PRNI write(!(x && y)); 87 STO y = !y; 36 LDA 0 88 LDA 90 LDV 38 LDV 39 NOT 91 NOT 40 BZE 46 92 BZE 14 until (!y); 43 LDC 94 LDA 0 44 BRN 50 96 LDA 0 98 LDV 46 LDA 48 LDV 99 NOT 49 NOT 100 STO x = !x; write(!x || !y); 50 PRNI 101 LDA O 51 LDA 0 103 LDV 53 LDV 104 NOT 60 9 until (!x); 54 BZE 105 BZE 107 HALT System Exit(); 56 LDC ``` It might be possible to manipulate these logical expressions to make for an even shorter solution, and you might like to puzzle out how this can be done. See also the discussion on the course web site. #### Task 4 - Execution overheads - part one See discussion of Task 10 below. ### Task 5 - Coding the hard way Task 5 was to hand-compile the Factorial program into PVM code. Most people got a long way towards this. Once again, look at how I have commented this, using "high level" code. ``` ; n is v0, f is v1, i is v2 0 DSP 42 MUL 2 LDA 43 STO 4 LDC 44 LDA f = f * i; 6 STO 7 LDA ; n = 1; 46 LDA n 48 LDV 9 LDV 1 49 LDC 10 LDC 12 CLE 13 BZE 20 ; // max = 20, constant 51 SUB 52 STO i = i - 1; ; while (n <= max) { 78 53 BRN 26 15 LDA 17 LDC 19 STO 55 LDA 57 LDV f = 1; 58 PRNI write(n); write("! = "); 20 LDA 22 LDA 2 59 PRNS 61 LDA 24 LDV 25 STO 63 LDV 64 PRNI "\n" 26 LDA 2 65 PRNS 28 LDV 29 LDC 31 CGT 67 LDA 69 LDA 0 0 0 while (i > 0) { 71 LDV 32 BZE 72 LDC 55 1 34 LDA 74 ADD n = n + 1; 36 LDA 1 75 STO ; } 38 LDV 76 BRN 7 39 LDA 2 78 HALT 41 LDV ``` Note that max is a constant, not a variable. There is no need to assign it a variable location and store 20 into this - simply build the value of 20 into the instructions that need to use it. And why on earth redraft the whole algorithm into one that uses a *do-while loop* (or a *repeat-until* loop) in place of the suggested *while* loop? ### Task 6 - Trapping overflow and other pitfalls Checking for overflow in multiplication and division was not always well done. You cannot safely multiply and then try to check overflow (it is too late by then) - you have to detect it in a more subtle way. Here is one way of doing it - note the check to prevent a division by zero when handling multiplication. This does not use any precision greater than that of the simulated machine itself. I don't think many spotted that the PVM. rem opcode also involved division, and some people who thought of using a multiplication overflow check on these lines forgot that numbers to be multiplied can be negative. An alternative, slightlier risky method is shown as a comment - risky because, if the emulator were written in a system that itself trapped multiplicative overflow, it would all blow up anyway. ``` case PVM.mul: // integer multiplication tos = Pop(); sos = Pop(); if (tos != 0 && Math.Abs(sos) > maxInt / Math.Abs(tos)) ps = badVal; ii // riskier // if (tos != 0 && tos * sos / tos != sos) ps = badVal; else Push(sos * tos); break; // integer division (quotient) case PVM div: tos = Pop(); if (tos == 0) ps = divZero; else Push(Pop() / tos); break; // integer division (remainder) case PVM.rem: tos = Pop(); if (tos == 0) ps = divZero; else Push(Pop() % tos); break; or for the "inline" assembler // integer multiplication case PVM.mul: tos = mem[cpu.sp++]; if (tos != 0 && Math.Abs(mem[cpu.sp]) > maxInt / Math.Abs(tos)) ps = badVal; // riskier // if (tos != 0 && tos * mem[cpu.sp] / tos != mem[cpu.sp]) ps = badVal; else mem[cpu.sp] *= tos; break: // integer division (quotient) case PVM div: tos = mem[cpu.sp++]; if (tos != 0) mem[cpu.sp] /= tos; else ps = divZero; break; // integer division (remainder) case PVM rem: tos = mem[cpu.sp++]; if (tos != 0) mem[cpu.sp] %= tos; else ps = divZero; break; ``` It is possible to use an intermediate long variable (but don't forget the casting operations or the Abs function): If given too large an index for an array to handle, a PVM program will terminate with an array bounds error as correctly trapped by the Push/Pop assembler. The same error would not be trapped by the Inline system, which merrily allows the LDXA opcode to wander wheresoever it likes. To fix this requires the following changes to the PVMInline interpreter. This strategy is discussed in the textbook. #### Task 7 - Arrays The code as supplied for tracking students' attendance at a practical suffered from various defects - a "student number" of zero is useless, even though it would be accepted quite happily, a student is able to clock in more than once, the constant StudentsInClass has a misleading value, and if a large negative number is supplied the program crashes. A few simple changes will fix some or all of these. I was happy to accept just one or two of these changes, but here is a rather radical rewrite that embraces them all, and uses the value 0 to terminate the program, just so that you can have a look at how this would have been translated. (STUDENTS1.PAV): ``` void main () { // Track students as they clock in and out of a practical - improved version // P.D. Terry, Rhodes University, 2017 // Improved version const StudentsInClass = 100; bool[] atWork = new bool[StudentsInClass + 1]; // students are numbered 1 .. 100 int student = 1; while (student <= StudentsInClass) {</pre> // nobody is at the practical to start with atWork[student] = false; student = student + 1; read("Student? (> 0 clocks in, < 0 clocks out, 0 terminates) ", student);</pre> while (student != 0) { bool clockingIn = true; // distinguish "in" and "out" easily if (student < 0) { clockingIn = false; // fix the number student = -student; if (student > StudentsInClass) write("Invalid student number\n"); else if (clockingIn) if (atWork[student]) write(student, " has already clocked in!\n"); else atWork[student] = true; else if (!atWork[student]) write(student, " has not yet clocked in!\n"); else atWork[student] = false; read("Student? (> 0 clocks in, < 0 clocks out, 0 terminates) ", student);</pre> } // while write("The following students have still not clocked out\n"); student = 1: while (student <= StudentsInClass) { if (atWork[student]) write(student); student = student + 1; } // while } // main ``` A translation into PVM code is a little tedious, and it is easy to leave some of the code out and get a corrupted solution: ``` Track students as they clock in and out pf a practical P.D. Terry, Rhodes University, 2017 bool[] atwork is v0, int student is v1 0 DSP 3 106 LDXA 2 LDA 0 107 LDV 4 LDC 100 108 BZE 118 if (atWork[student]) 6 LDC 1 110 LDA 1 8 ADD 112 LDV write (student) 9 ANEW 113 PRNI 10 STO " has already clocked in!\n" bool[] atWork = new bool[...] 114 PRNS 128 11 LDA 1 116 BRN 13 LDC 1 118 LDA 0 else 15 STO int student = 1; 120 LDV 16 LDA 1 121 LDA 1 18 LDV 123 LDV 19 LDC 100 124 LDXA 21 CLE 125 LDC 1 while (student <= 100) { 127 STO atWork[student] = true; 22 BZE 24 LDA 128 BRN 159 26 LDV 130 LDA else 0 27 LDA 1 132 LDV 29 LDV 133 LDA 1 30 LDXA 135 LDV 31 LDC 0 136 LDXA atWork[Student] = false; 33 STO 137 I DV 34 LDA 1 138 NOT 36 LDA 139 BZE 149 if (!atWork[student] 1 38 LDV 141 LDA 1 write(student) 39 LDC 1 143 LDV 41 ADD student = student + 1: 144 PRNI " has not yet clocked in!\n" 42 STO 145 PRNS 43 BRN 16 147 BRN 159 "Student? (> 0 clocks in, < 0 ... 149 LDA 45 PRNS 0 47 LDA 1 151 LDV else 49 INPI read(student); 152 LDA 50 LDA 1 154 LDV 52 LDV 155 LDXA 53 LDC 0 156 LDC 0 55 CNE atWork[student] = false]; 158 STO while (student != 0) { "Student? (> 0 clocks in, < 0 ... 56 BZE 166 159 PRNS 58 LDA 2 161 LDA 60 LDC 1 163 INPI read(student) } // while (student != 0) 62 STO bool clockingIn = true; 164 BRN "The following students have still not ... 63 LDA 1 166 PRNS 65 LDV 168 LDA 66 LDC 0 170 LDC 1 68 CLT 172 STO student = 1; 69 BZE 83 if (student < 0) { 173 LDA 1 71 I DA 2 175 I DV 100 73 LDC 0 176 LDC 75 STO clockingIn = false; 178 CLE 76 LDA 179 BZE 206 while (student <= 100 78 LDA 181 LDA 0 1 80 LDV 183 LDV 81 NEG 184 LDA 1 82 STO 186 LDV 83 LDA 187 LDXA 85 LDV 188 LDV 100 if (atWork[student]) 195 86 LDC 189 BZE 88 CGT 191 LDA 1 89 BZE 95 if (student > StudentsInClass) 193 LDV 91 PRNS "Invalid student number" 194 PRNI write(student); 93 BRN 159 195 LDA 95 LDA 197 I DA 2 1 97 LDV 199 LDV 98 BZE 130 else if (clockingIn) 200 LDC 1 100 LDA 202 ADD 102 LDV 203 STO student = student + 1; } // while (student <= 100</pre> 103 LDA 204 BRN 173 105 LDV 206 HALT ; System.Exit() ``` Task 8 - Your lecturer is quite a character To be able to deal with input and output of character data we need to add two new opcodes, modelled on the INPI and PRNI codes whose interpretation would be as below. All of the new opcodes require additions to the lists of opcodes in the assembler and interpreter (be careful of two-word opcodes; they crop up in several places). Note that the output of numbers was arranged to have a leading space; this is not as pretty when you see i t a p p lied to characters, is it-which is why the call to results.write uses a second argument of 1, not 0 (this argument could have been omitted). Note the use of the modulo arithmetic to make quite sure that only sensible ASCII characters will be printed: ``` case PVM.inpc: // character input adr = Pop(); if (InBounds(adr)) { mem[adr] = data.ReadChar(); if (data error()) ps = badData; break; case PVM.prnc: // character output if (tracing) results write(padding); results.Write((char) (Math.Abs(Pop()) % (maxChar + 1)), 1); if (tracing) results.WriteLine(); break; or for the "inline" assembler case PVM inpo: // character input mem[mem[cpu.sp++]] = data.ReadChar(); case PVM.prnc: // character output if (tracing) results Write(padding); results.Write((char) (Math.Abs(mem[cpu.sp++]) % (maxChar + 1)), 1); if (tracing) results.WriteLine(); ``` To build a really safe system there are further refinements we could make. It can be argued that we should not try to store a value outside of the range 0 .. 255 into a character variable. This suggests that we should have a range of STO type instructions that check the value on the top of stack before assigning it. One of these - STOC to act as a variation on STO - would be interpreted as follows; we would need another to handle STLC and so on (these have not yet been implemented in the solution kit). Introducing opcodes to convert to lower or upper case is simply done by using the methods from the C# Char wrapper class (notice the need for casting operations as well, to satisfy the C# compiler): ``` case PVM.low: // toLowerCase Push(Char ToLower((char) Pop())); break: case PVM.cap: // toUpperCase Push(Char.ToUpper((char) Pop())); or for the "inline" assembler - note that cpu.sp is left unaltered. case PVM.low: // toLowerCase mem[cpu.sp] = Char.ToLower((char) mem[cpu.sp]); break; case PVM.cap: // toUpperCase mem[cpu.sp] = Char.ToUpper((char) mem[cpu.sp]); break: ``` The INC and DEC operations are best performed by introducing opcodes that assume that an *address* has been planted on the top of stack for the variable (or array element) that needs to be incremented or decremented. This may not have been apparent to everyone, but consider (as hinted in the prac sheet) a statement like a[i+j]++; With all these in place the string reversal algorithm can be programmed as follows: ``` ; Read a sentence and reverse in UPPER CASE 30 LDC ; P.D. Terry, Rhodes University, 2017 SUB ; char[] sentence is v0; leng is v1 33 LDXA ; original opcode set 34 LDV 46; // '.' is 46 in ASCII table 35 LDC 0 DSP 2 37 CEQ 2 LDA 38 BZE 13; until (sentence[leng-1] = ' '); 4 LDC 256; 40 LDA 42 6 ANFW LDV 7 STO sentence = new char[256]; 43 LDC 0 8 LDA 45 CGT while (leng > 0) { 10 LDC 46 BZE 63; leng = 0; 48 12 1 ; STO LDA 13 LDA 0 repeat { 50 DEC leng--; 0 15 LDV 51 LDA 16 LDA 53 LDV 54 18 LDV LDA 19 LDXA 56 LDV 20 57 INPC read(sentence[leng]); LDXA 21 LDA 58 LDV 23 59 CAP INC leng++; 24 LDA 60 PRNC write(upper(sentence[leng]); 40 ; } 61 BRN 26 LDV 27 ; System Exit() LDA 63 HALT LDV 29 ``` Task 9 - Improving the opcode set still further Once again, adding the LDL N and STL N opcodes is very easy. Unfortunately, it is easy to leave some of the changes out and get a corrupted solution. The PVMAsm class requires modification in the *switch* statement that recognizes two-word opcodes: The PVM class requires several additions. We must add to the *switch* statement in the Trace and ListCode methods (several submissions missed this): ``` static void Trace(OutFile results, int pcNow, bool traceStack, bool traceHeap) { switch (cpu.ir) { ... case PVM.ldl: // ++++++++++++++ addition case PVM.stl: // ++++++++++++++++ addition } results.WriteLine(); } ``` and we must provide case arms for all the new opcodes. A selection of these follows; the rest can be seen in the solution kit. Notice that for consistency all the "inBounds" checks should really be performed on the new opcodes too (several submissions missed this, and they have been left out here too so that you can add them yourselves). Firstly the basic two-word ones: // store local value case PVM.stl: A great many submissions made a rather bizarre error. Part of the original kit read as follows - where the action for all the "missing" opcodes was to trap an error if they were encountered (by accident?) mem[cpu.fp - 1 - mem[cpu.pc++]] = mem[cpu.sp++]; Incompletely modifying the code on the lines shown below would have had the effect of adding PVM.lda\_2, PVM.lda 3 as "extra" labels to the PVM.ldl clause (and similarly for other cases)! In improving the string reversal program, some people forgot to introduce the LDL and STL wherever they could, did not incorporate CAP and INC/DEC and ran the last loop the wrong way! If one codes carefully, this program reduces to the code shown below: ``` ; Read a sentence and reverse in UPPER CASE 17 SUB P.D. Terry, Rhodes University, 2017 18 LDXA char[] sentence is v0; leng is v1 19 LDV // '.' is 46 in ASCII table 46 ; extended opcode set 20 LDC 22 CEQ DSP 23 BZE 8 until (sentence[leng-1] = '.'); LDL_1 LDC LDC_0 ANEW sentence = new char[256]; 26 STL\_0 27 CGT while (leng > 0) { 6 7 28 40 LDC_0 BZE STL_1 LDL_0 LDL_1 LDXA leng = 0; 30 LDA_1 31 repeat { leng--; DEC LDL 0 33 INPC read(sentence[leng]); LDXA 12 13 LDA_1 INC 35 LDV 36 lena++; CAP LDL_0 37 14 PRNC write(upper(sentence[leng]); 15 LDL_1 LDC_1 BRN 25 38 ; System Exit(); ``` #### Task 10 - Execution overheads - part two In the prac kit you were supplied with a second translation SIEVE2.PVM of a cut down version of the same prime-counting program SIEVE.PAV as was used in Task 4, but this time using the extended opcode set developed in the last task. The kit also included the code that could be executed if the PVM were extended still further on the lines of the suggestions on page 44 of the textbook. Running SIEVE1.PVM through both of the original and modified assemblers, and SIEVE2.PVM and SIEVE3.PVM through both of the modified assemblers gave the following timings for the same limit (4000) and number of iterations (100) on my machines, one a laptop running Windows XP and one a desktop running Windows 7-32. | Desktop Machine (Win 7-32) | Sieve1.pvm | (1.00) | Sieve2.pvm | (0.78) | Sieve3.pvm | (0.75) | |----------------------------------|--------------|--------|--------------|--------|--------------|--------| | ASM1 (Push/Pop)<br>ASM2 (Inline) | 0.73<br>0.30 | (0.41) | 0.57<br>0.20 | (0.36) | 0.55<br>0.13 | (0.24) | | | | | | | | | | Laptop machine (XP-32) | Sieve1.pvm | (1.00) | Sieve2.pvm | (0.75) | Sieve3.pvm | (0.67) | The Desktop times were about 65% of those on the slower Laptop. The Inline times were about 40% of the Push/Pop system with the original limited opcode set. The Inline times were about 30% of the Push/Pop system with the extended opcode set, The reasons are not hard to find. The InLine emulator makes very few function calls within the fetch-execute cycle, whereas the Push/Pop one makes a very large number, each carrying an extra overhead. Similarly, the introduction of the LDL and STL codes allowed for fewer opcodes to be interpreted to achieve the desired result. If one wishes to improve the performance of the interpreter further it might make sense to get some idea of which opcodes are executed most often. Clearly this will depend on the application, and so a mix of applications might need to be analysed. It is not difficult to add a profiling facility to the interpreter, and this has been done in yet another interpreter that you can find in the solution kit. Running this on the Sieve files yielded some interesting results. For a start, there were enormous numbers of steps executed - probably more than you might think. | Original opcodes | Extended opcode set<br>LDL and STL used | Extended opcode set LDL, STL, LDL_x STL_x | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 39 494 323 operations. | 27 070 118 operations. (68%) | ) (same op count) | | LDA 10824405 LDV 9386302 LDC 4948605 STO 3165703 BZE 2182801 CLE 1782901 ADD 1782701 BRN 1727700 LDXA 1727600 CGT 982800 AND 982800 HALT 1 ANEW 1 PRNS 1 PRNI 1 DSP 1 | LDL 8186502 LDC 4148705 BZE 2182801 CLE 1782901 BRN 1727700 LDXA 1727600 STO 1327700 STL 1038103 ADD 982801 AND 982800 CGT 982800 LDA 799900 LDV 399900 LDV 399900 DSP 1 PRNI 1 PRNS 1 ANEW 1 HALT 1 | LDL_2 3821200 LDL_1 2582600 BZE 2182801 LDC_0 1910701 LDC 1782902 CLE 1782901 BRN 1727700 LDL_0 1727600 LDXA 1727600 STO 1327700 ADD 982801 STL_2 982800 AND 982800 INC 799900 LDA_1 799800 LDC_1 454902 LDV 399900 STL_3 454902 LDV 399900 STL_55101 LDL 55001 LDC_2 200 STL_1 200 LDL_3 101 LDC_2 200 STL_1 200 LDL_3 101 LDC_3 101 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_1 454902 LDV 399900 STL_1 55101 LDL 55001 LDC_2 100 LDC_1 100 LDC_2 100 LDC_1 100 LDC_2 100 LDC_1 100 LDC_2 100 LDC_1 100 LDC_1 100 LDC_2 100 LDC_1 |