Sources of full solutions for these problems may be found on the course web page as the file PRAC25A.ZIP (Java) or PRAC25AC.ZIP (C#).
The extra pragmas needed in the refined Parva compiler are easily introduced. We need some static fields:
public static boolean debug = false, * shortCirc = true, * listCode = false, warnings = true;
The definitions of the pragmas are done in terms of these:
PRAGMAS * CodeOn = "$C+" . (. listCode = true; .) * CodeOff = "$C-" . (. listCode = false; .) DebugOn = "$D+" . (. debug = true; .) DebugOff = "$D-" . (. debug = false; .) * ShortCircOn = "$S+" . (. shortCirc = true; .) * ShortCircOff = "$S-" . (. shortCirc = false; .) WarnOn = "$W+" . (. warnings = true; .) WarnOff = "$W-" . (. warnings = false; .)
It is convenient to be able to set the options with command line parameters as well. This involves a straightforward change to the Parva.frame file:
for (int i = 0; i < args.length; i++) { if (args[i].toLowerCase().equals("-l")) mergeErrors = true; else if (args[i].toLowerCase().equals("-d")) Parser.debug = true; else if (args[i].toLowerCase().equals("-w")) Parser.warnings = false; * else if (args[i].toLowerCase().equals("-c")) Parser.listCode = true; * else if (args[i].toLowerCase().equals("-s")) Parser.shortCirc = false; else inputName = args[i]; } if (inputName == null) { System.err.println("No input file specified"); * System.err.println("Usage: Parva [-l] [-d] [-w] [-c] [-s] source.pav [-l] [-d] [-w] [-c] [-s]"); System.err.println("-l directs source listing to listing.txt"); System.err.println("-d turns on debug mode"); System.err.println("-w suppresses warnings"); * System.err.println("-c lists object code (.cod file)"); * System.err.println("-s turn off short-circuit evaluation"); System.exit(1); }
Finally, the following change to the frame file gives the option of suppressing the generation of the .COD listing.
* if (Parser.listCode) PVM.listCode(codeName, codeLength);
The problem called for a trivial change to the definition of the identifier token so as to prevent an identifier from having an underscore as its final character:
TOKENS * identifier = letter { letter | digit | "_" { "_" } ( letter | digit ) } . // or identifier = letter { { "_" } (letter | digit ) } .
To be as sympathetic as possible in the face of confusion between various operators is easily achieved - we make the sub-parsers that identify these operators accept the incorrect ones, at the expense of generating an error message (or, if you want to be really kind, issue a warning only):
EqualOp<out int op> (. op = CodeGen.nop; .) = "==" (. op = CodeGen.ceq; .) | "!=" (. op = CodeGen.cne; .) * | "=" (. SemError("== intended?"); .) * | "<>" (. SemError("!= intended?"); .) . AssignOp * = "=" * | ":=" (. SemError("= intended?"); .) .
Similarly, recovering from the spurious introduction of then into an IfStatement is quite easily achieved:
IfStatement<StackFrame frame> (. Label falseLabel = new Label(!known); .) = "if" "(" Condition ")" (. CodeGen.branchFalse(falseLabel); .) * [ "then" (. SemError("then is not used in Parva"); .) ] Statement<frame> (. falselabel.here(); .) .
The problem called for the introduction of the final keyword in variable declarations, so as to allow code of the form:
final int max = 2; int i = 5; final int iPlusMax = i + max; final int[] list = new int[iPlusMax];
The key to this is to add an extra field to those in the Entry class used by the table handling routines:
class Entry { public static final int Con = 0, // identifier kinds Var = 1, Fun = 2, noType = 0, // identifier (and expression) types. The numbering is nullType = 2, // significant as array types are denoted by these intType = 4, // numbers + 1 boolType = 6, voidType = 8; public int kind; public String name; public int type; public int value; // constants public int offset; // variables public Entry nextInScope; // link to next entry in current scope public boolean declared; // true for all except sentinel entry * public boolean canChange; // true except for constants } // end Entry
A similar addition is needed in the DesType class
class DesType { // Objects of this type are associated with l-value and r-value designators public Entry entry; // the identifier properties public int type; // designator type (not always the entry type) * public boolean canChange; // false if entry is marked constant public DesType(Entry entry) { this.entry = entry; this.type = entry.type; * this.canChange = entry.canChange; } } // end DesType
With these in place the parsers for handling VarDeclarations can set the fields correctly:
VarDeclarations<StackFrame frame> (. int type; boolean canChange = true; .) * = [ "final" (. canChange = false; .) * ] Type<out type> * OneVar<frame, type, canChange> * { WEAK "," OneVar<frame, type, canChange> } WEAK ";" . OneVar<StackFrame frame, int type, boolean canChange> (. int expType; Entry var = new Entry(); .) = Ident<out var.name> (. var.kind = Entry.Var; var.type = type; * var.canChange = canChange; var.offset = frame.size; frame.size++; .) ( AssignOp (. CodeGen.loadAddress(var); .) Expression<out expType> (. if (!compatible(var.type, expType)) SemError("incompatible types in assignment"); CodeGen.assign(var.type); .) * | (. if (!canChange) * SemError("defining expression required"); .) ) (. Table.insert(var); .) .
where, it should be noted, it is an error to apply final to a variable declaration if the initial definition of a value for that variable is omitted. The canChange field is then checked at the points where one might attempt to alter a variable marked as final, namely within AssignmentStatements and ReadElements:
Assignment (. int expType; DesType des; .) = Designator<out des> (. if (des.entry.kind != Entry.Var) SemError("invalid assignment"); * if (!des.canChange) * SemError("may not alter this variable"); AssignOp Expression<out expType> (. if (!compatible(des.type, expType)) SemError("incompatible types in assignment"); CodeGen.assign(des.type); .) . ReadElement (. String str; DesType des; .) = StringConst<out str> (. CodeGen.writeStr(str); .) | Designator<out des> (. if (des.entry.kind != Entry.Var) SemError("wrong kind of identifier"); * if (!des.canChange) * SemError("may not alter this variable"); switch (des.type) { case Entry.intType: case Entry.boolType: CodeGen.read(des.type); break; default: SemError("cannot read this type"); break; } .) .
There is one further subtlety. Marking an "array" as final implies only that the reference to the array may not be altered, not that individual elements may not be altered. This necessitates a tweak to the Designator parser:
Designator<out DesType des> (. String name; int indexType; .) = Ident<out name> (. Entry entry = Table.find(name); if (!entry.declared) SemError("undeclared identifier"); des = new DesType(entry); if (entry.kind == Entry.Var) CodeGen.loadAddress(entry); .) * [ "[" (. des.canChange = true; if (isRef(des.type)) des.type--; else SemError("unexpected subscript"); if (entry.kind != Entry.Var) SemError("unexpected subscript"); CodeGen.dereference(); .) Expression<out indexType> (. if (!isArith(indexType)) SemError("invalid subscript type"); CodeGen.index(); .) "]" ] .
This is easier than might at first appear. The code generation routine for the new opcode becomes
public static void writeStr(String str) { // Generates code to output str after storing it in the string pool emit(PVM.prnsp); emit(PVM.addString(str)); }
Changes are needed in the PVM emulator. Firstly, it can maintain the string pool by using an ArrayList as shown here - note that duplicate strings are not added to the pool a second time.
public static ArrayList<String> strPool = new ArrayList<String>(); public static int addString(String str) { // Adds str to string pool if not already there, returns its index int i = 0; for (i = 0; i < strPool.size(); i++) if (str.equals(strPool.get(i))) return i; /* if not found */ strPool.add(str); return i; }
The interpretation of the PVM.prnsp opcode has to alter:
case PVM.prnsp: // string output from string pool if (tracing) results.write(padding); results.write(strPool.get(next())); if (tracing) results.writeLine(); break;
The part of the ListCode method responsible for displaying strings must decode the string in a different way.
case PVM.prnsp: i = (i + 1) % memSize; codeFile.write(mem[i]); codeFile.write(" ( \""); String str = strPool.get(mem[i]); for (j = 0; j < str.length(); j++) switch (str.charAt(j)) { case '\\' : codeFile.write("\\\\"); break; case '\"' : codeFile.write("\\\""); break; case '\'' : codeFile.write("\\\'"); break; case '\b' : codeFile.write("\\b"); break; case '\t' : codeFile.write("\\t"); break; case '\n' : codeFile.write("\\n"); break; case '\f' : codeFile.write("\\f"); break; case '\r' : codeFile.write("\\r"); break; default : codeFile.write(str.charAt(j)); break; } codeFile.write("\" )"); break;
To allow for a character type involves one in a lot of straightforward alterations, as well as some more elusive ones. Firstly, we extend the definition of a symbol table entry:
class Entry { public static final int Con = 0, // identifier kinds Var = 1, Fun = 2, noType = 0, // identifier (and expression) types. The numbering is nullType = 2, // significant as array types are denoted by these intType = 4, // numbers + 1 boolType = 6, * charType = 8, * voidType = 10; ... } // end Entry
The Table class requires a similar small change to introduce the new type name needed if the symbol table is to be displayed:
* typeNames.Add("char"); * typeNames.Add("char[]");
A minor change to the Constant production is needed to allow character literals to be regarded as of the new charType:
Constant<out ConstRec con> (. con = new ConstRec(); .) = IntConst<out con.value> (. con.type = Entry.intType; .) * | CharConst<out con.value> (. con.type = Entry.charType; .) | "true" (. con.type = Entry.boolType; con.value = 1; .) | "false" (. con.type = Entry.boolType; con.value = 0; .) | "null" (. con.type = Entry.nullType; con.value = 0; .) .
Reading and writing single characters is easy:
ReadElement (. String str; DesType des; .) * = StringConst<out str> (. CodeGen.writeStr(str); .) | Designator<out des> (. if (des.entry.kind != Entry.Var) SemError("wrong kind of identifier"); * if (!des.canChange) * SemError("may not alter this variable"); switch (des.type) { case Entry.intType: case Entry.boolType: * case Entry.charType: CodeGen.read(des.type); break; default: SemError("cannot read this type"); break; } .) . WriteElement (. int expType; String str; .) * = StringConst<out str> (. CodeGen.writeStr(str); .) | Expression<out expType> (. switch (expType) { case Entry.intType: case Entry.boolType: * case Entry.charType: CodeGen.write(expType); break; default: SemError("cannot write this type"); break; } .) .
The associated code generating methods require matching additions:
public static void read(int type) { // Generates code to read a value of specified type // and store it at the address found on top of stack switch (type) { case Entry.intType: emit(PVM.inpi); break; case Entry.boolType: emit(PVM.inpb); break; * case Entry.charType: emit(PVM.inpc); break; } } public static void write(int type) { // Generates code to output value of specified type from top of stack switch (type) { case Entry.intType: emit(PVM.prni); break; case Entry.boolType: emit(PVM.prnb); break; * case Entry.charType: emit(PVM.prnc); break; } }
The major part of this exercise was concerned with the changes needed to apply various constraints on operands of the char type. Essentially it ranks as an arithmetic type, in that expressions of the form
character + character
character > character
character + integer
character > integer
are all allowable. This can be handled by modifying the helper methods in the parser as follows:
static boolean isArith(int type) { * return type == Entry.intType || type == Entry.charType || type == Entry.noType; } * static boolean compatible(int typeOne, int typeTwo) { * // Returns true if typeOne is compatible (and comparable for equality) with typeTwo return typeOne == typeTwo * || isArith(typeOne) && isArith(typeTwo) || typeOne == Entry.noType || typeTwo == Entry.noType || isRef(typeOne) && typeTwo == Entry.nullType || isRef(typeTwo) && typeOne == Entry.nullType; }
However, assignment compatibility is more restrictive. Assignments of the form
integer = integer expression
integer = character expression
character = character expression
are allowed, but
character = integer expression
is not allowed. This may be checked within the Assignment production with the aid of a further helper method assignable:
* static boolean assignable(int typeOne, int typeTwo) { * // Returns true if typeOne may be assigned a value of typeTwo * return typeOne == typeTwo * || typeOne == Entry.intType && typeTwo == Entry.charType * || typeOne == Entry.noType || typeTwo == Entry.noType * || isRef(typeOne) && typeTwo == Entry.nullType; * }
The assignable() function call now takes the place of the compatible() function call in the many places in OneVar and Assignment where, previously, calls to compatible() appeared.
We turn finally to consideration of the changes needed to the various sub-parsers for expressions.
A casting mechanism is introduced to handle the situations where it is necessary explicitly to convert integer values to characters, so that
character = (char) integer
is allowed, and for completeness, so are
integer = (int) character
integer = (char) character
character = (char) character
Casting operations are accompanied by a type conversion; the (char) cast also introduces the generation of code for checking that the integer value to be converted lies within range.
This is all handled within the Primary production, which has to be factored to deal with the potential LL(1) trap in distinguishing between components of the form "(" "char" ")" and "(" Expression ")":
Primary<out int type> (. type = Entry.noType; int size; DesType des; ConstRec con; .) = Designator<out des> (. type = des.type; switch (des.entry.kind) { case Entry.Var: CodeGen.dereference(); break; case Entry.Con: CodeGen.loadConstant(des.entry.value); break; default: SemError("wrong kind of identifier"); break; } .) | Constant<out con> (. type = con.type; CodeGen.loadConstant(con.value); .) | "new" BasicType<out type> (. type++; .) "[" Expression<out size> (. if (!isArith(size)) SemError("array size must be integer"); CodeGen.allocate(); .) "]" * | "(" * ( "char" ")" * Factor<out type> (. if (!isArith(type)) * SemError("invalid cast"); * else type = Entry.charType; * CodeGen.castToChar(); .) * | "int" ")" * Factor<out type> (. if (!isArith(type)) * SemError("invalid cast"); * else type = Entry.intType; .) * | Expression<out type> ")" * ) .
Strictly speaking the above grammar departs slightly from the Java version, where the casting operator is regarded as weaker than the parentheses around an Expression, but in practice it makes little difference.
Various of the other productions need modification. The presence of an arithmetic operator correctly placed between character or integer operands must result in the sub-expression so formed being of integer type (and never of character type). So, for example:
MultExp<out int type> (. int type2; int op; .) = Factor<out type> { MulOp<out op> Factor<out type2> (. if (!isArith(type) || !isArith(type2)) { SemError("arithmetic operands needed"); type = Entry.noType; } * else type = Entry.intType; CodeGen.binaryOp(op); .) } .
Similarly a prefix + or - operator applied to an integer or character Factor creates a new factor of integer type (see full solution for details).
The extra code generation method we need is as follows:
public static void castToChar() { // Generates code to check that TOS is within the range of the character type emit(PVM.i2c); }
and within the switch statement of the emulator method we need:
case PVM.i2c: // check convert character to integer if (mem[cpu.sp] < 0 || mem[cpu.sp] > maxChar) ps = badVal; break;
The interpreter has another opcode for checked storage of characters, but if the i2c opcodes are inserted correctly it appears that we do not really need stoc:
case PVM.stoc: // character checked store tos = pop(); adr = pop(); if (inBounds(adr)) if (tos >= 0 && tos <= maxChar) mem[adr] = tos; else ps = badVal; break;
This is easily implemented as follows, where we have also shown how the feature is controlled by the Boolean flag set by the pragma:
Expression<out int type> (. int type2; Label shortcircuit = new Label(!known); .) = AndExp<out type> * { "||" (. if (shortCirc) CodeGen.booleanOp(shortcircuit, CodeGen.or); .) AndExp<out type2> (. if (!isBool(type) || !isBool(type2)) SemError("Boolean operands needed"); * if (!shortCirc) CodeGen.binaryOp(CodeGen.or); type = Entry.boolType; .) } (. shortcircuit.here(); .) . AndExp<out int type> (. int type2; Label shortcircuit = new Label(!known); .) = EqlExp<out type> * { "&&" (. if (shortCirc) CodeGen.booleanOp(shortcircuit, CodeGen.and); .) EqlExp<out type2> (. if (!isBool(type) || !isBool(type2)) SemError("Boolean operands needed"); * if (!shortCirc) CodeGen.binaryOp(CodeGen.and); type = Entry.boolType; .) } (. shortcircuit.here(); .) .
It might not at first have been obvious, but hopefully everyone eventually saw that this extension is handled by clever modifications to the Assignment production, which has to be factorized in such a way as to avoid LL(1) conflicts. The code below achieves all this (including the tests for compatibility that some students probably omitted) by assuming the existence of a few new machine opcodes, as suggested in the textbook.
Assignment (. int expType; DesType des; * boolean inc = true; .) = Designator<out des> (. if (des.entry.kind != Entry.Var) SemError("invalid assignment"); if (!des.canChange) SemError("may not alter this variable"); .) ( AssignOp Expression<out expType> (. if (!assignable(des.type, expType)) SemError("incompatible types in assignment"); CodeGen.assign(des.type); .) * | ( "++" | "--" (. inc = false; .) * ) (. if (!isArith(des.type)) * SemError("arithmetic type needed"); * CodeGen.incOrDec(inc, des.type); .) * ) * | ( "++" | "--" (. inc = false; .) * ) Designator<out des> (. if (des.entry.kind != Entry.Var) * SemError("variable designator required"); * if (!des.canChange) * SemError("may not alter this variable"); * if (!isArith(des.type)) * SemError("arithmetic type needed"); * CodeGen.incOrDec(inc, des.type); .) .
The extra code generation routine is straightforward, but note that we need to cater for characters specially
public static void incOrDec(boolean inc, int type) { // Generates code to increment the value found at the address currently // stored at the top of the stack. // If necessary, apply character range check * if (type == Entry.charType) emit(inc ? PVM.incc : PVM.decc); * else emit(inc ? PVM.inc : PVM.dec); }
As usual, the extra opcodes in the PVM make all this easy to achieve at run time. Some submissions might have forgotten to include the check that the address was "in bounds". I suppose one could argue that if the source program were correct, then the addresses could not go out of bounds, but if the interpreter were to be used in conjunction with a rather less fussy assembler (as we had in earlier practicals) it would make sense to be cautious.
case PVM.inc: // int ++ adr = pop(); if (inBounds(adr)) mem[adr]++; break; case PVM.dec: // int -- adr = pop(); if (inBounds(adr)) mem[adr]--; break; case PVM.incc: // char ++ adr = pop(); if (inBounds(adr)) if (mem[adr] < maxChar) mem[adr]++; else ps = badVal; break; case PVM.decc: // char -- adr = pop(); if (inBounds(adr)) if (mem[adr] > 0) mem[adr]--; else ps = badVal; break;
The syntax of the BreakStatement is, of course, trivial. The catch is that one has to allow these statements only in the context of loops. To find a context-free grammar with this restriction is not worth the effort.
One approach that incorporates context-sensitive checking in conjunction with code generation is based on passing labels as arguments to various subparsers. We change the parser for Statement and for Block as follows:
* Statement<StackFrame frame, Label breakLabel> * = SYNC ( Block<frame, breakLabel> | ConstDeclarations | VarDeclarations<frame> | AssignmentStatement * | IfStatement<frame, breakLabel> | WhileStatement<frame> | ForStatement<frame> | SwitchStatement<frame> * | BreakStatement<breakLabel> | HaltStatement | ReturnStatement | ReadStatement | WriteStatement | ";" | "stackdump" ";" (. if (debug) CodeGen.dump(); .) ) . * Block<StackFrame frame, Label breakLabel> = (. Table.openScope(); .) * "{" { Statement<frame, breakLabel> } WEAK "}" (. if (debug) Table.printTable(OutFile.StdOut); Table.closeScope(); .) .
and the parsers for the statements that are concerned with looping, breaking, and making decisions become
IfStatement<StackFrame frame, Label breakLabel> (. Label falseLabel = new Label(!known); .) = "if" "(" Condition ")" (. CodeGen.branchFalse(falseLabel); .) [ "then" (. SemError("then is not used in Parva"); .) * ] Statement<frame, breakLabel> (. falseLabel.here(); .) . WhileStatement<StackFrame frame> (. Label loopStart = new Label(known); .) = "while" "(" Condition ")" (. Label loopExit = new Label(!known); CodeGen.branchFalse(loopExit); .) * Statement<frame, loopExit> (. CodeGen.branch(loopStart); loopExit.here(); .) . BreakStatement<Label breakLabel> * = "break" (. if (breakLabel == null) * SemError("break is not allowed here"); * else CodeGen.branch(breakLabel); .) * WEAK ";" .
Many languages provide for a ForStatement in one or other form. Although most people are familiar with these, their semantics and implementation can actually be quite tricky. The exercises suggested implementing each of two possible forms of ForStatement.
The full solution, by carefully factoring the grammar, shows how both forms might be possible within one language definition. For the purposes of discussion it is convenient to treat each separately.
A possible syntax for a ForStatement, suggested by the C family of languages, might be:
ForStatement = "for" ([ [BasicType] Ident "=" Expression ] ";" [ Condition ] ";" [ Assignment] ) Statement .
The semantics of this statement are deemed to be essentially equivalent to
Ident "=" Expression; while ( Condition ) { Statement; Assignment; }
The solution follows:
ForStatement<StackFrame frame> (. int type = Entry.noType, expType; boolean local = false; boolean canChange; Label loopExit = new Label(!known); Label loopContinue = new Label(!known); Label loopBody = new Label(!known); Entry var = new Entry(); String name; .) = "for" "(" (. Table.openScope(); .) [ [ BasicType<out type> (. local = true; .) ] Ident<out name> (. if (local) { var.name = name; var.kind = Entry.Var; var.type = type; var.offset = frame.size; var.canChange = true; frame.size++; } else { var = Table.find(name); if (!var.declared) SemError("undeclared identifier"); if (var.kind != Entry.Var) SemError("illegal control variable"); if (!var.canChange) SemError("may not alter this variable"); } if (!isArith(var.type)) SemError("illegal type of control variable"); .) ( "=" | ":=" (. SemError("= intended?"); .) ) (. CodeGen.loadAddress(var); .) Expression<out expType> (. if (!assignable(var.type, expType)) SemError("incompatible types in assignment"); CodeGen.assign(var.type); if (local) Table.insert(var); .) ] ";" (. Label testLabel = new Label(known); .) ( Condition | (. CodeGen.loadConstant(1); .) ) (. CodeGen.branchFalse(loopExit); CodeGen.branch(loopBody); .) ";" (. loopContinue.here(); .) [ Assignment ] (. CodeGen.branch(testLabel); .) ")" (. loopBody.here(); canChange = var.canChange; var.canChange = false; .) Statement<frame, loopExit> (. CodeGen.branch(loopContinue); loopExit.here(); var.canChange = canChange; Table.closeScope(); .)
Notes
A statement like
for ( i = initial ; condition; assignment ) statement
generates the code
LDA i initial STO L1 condition BZE L4 (exit) BRN L3 (body) L2 assignment BRN L1 (condition) L3 statement BRN L2 (assignment) L4 ...
This exercise suggested adding a simple Pascal-style ForStatement to Parva, to allow statements whose concrete syntax is defined by
ForStatement = "for" Ident "=" Expression ("to" | "downto") Expression Statement .
The problem as posed already suggested part of a solution. Pascal was a much "safer" language than C, and the semantics of the Pascal-style ForStatement are better described as follows. The statements
for Control = Expression1 to Expression2 Statement for Control = Expression1 downto Expression2 Statement
should be regarded as more closely equivalent to
Temp1 := Expression1 Temp1 := Expression1 Temp2 := Expression2 Temp2 := Expression2 IF Temp1 > Temp2 THEN GOTO EXIT IF Temp1 < Temp2 THEN GOTO EXIT Control := Temp1; Control := Temp1; BODY: Statement BODY: Statement IF Control = Temp2 THEN GOTO EXIT IF Control = Temp2 THEN GOTO EXIT Control := Control + 1 Control := Control - 1 GOTO BODY GOTO BODY EXIT: EXIT:
These arrangements can be handled by the following parsing method
ForStatement<StackFrame frame> (. int expType; boolean canChange; Label loopBody = new Label(!known); Label loopExit = new Label(!known); Entry var = new Entry(); String name; .) = "for" Ident<out name> (. var = Table.find(name); if (!var.declared) SemError("undeclared identifier"); if (var.kind != Entry.Var) SemError("illegal control variable"); if (!var.canChange) SemError("may not alter this variable"); CodeGen.loadAddress(var); canChange = var.canChange; var.canChange = false; if (!isArith(var.type)) SemError("control variable must be of arithmetic type"); .) ( "=" | ":=" (. SemError("= intended?"); .) ) Expression<out expType> (. if (!assignable(var.type, expType)) SemError("incompatible with control variable"); boolean up = true; .) ( "to" | "downto" (. up = false; .) ) Expression<out expType> (. if (!assignable(var.type, expType)) SemError("incompatible with control variable"); CodeGen.startForLoop(up, loopExit); loopBody.here(); .) Statement<frame, loopExit> (. CodeGen.endForLoop(up, loopBody); var.canChange = canChange; loopExit.here(); CodeGen.pop(3); .) .
The code generation routines, as usual, are quite simple:
public static void startForLoop(boolean up, Label destination) { // Generates prologue test for a for loop (either up or down) if (up) emit(PVM.sfu); else emit(PVM.sfd); emit(destination.address()); } public static void endForLoop(boolean up, Label destination) { // Generates epilogue test and increment/decrement for a for loop (either up or down) if (up) emit(PVM.efu); else emit(PVM.efd); emit(destination.address()); } public static void pop(int n) { // Generates code to discard top n elements from the stack emit(PVM.dsp); emit(-n); }
but the magic that makes this work is contained in the interpreter with opcodes that are probably more complex than others you have seen to this point:
case PVM.sfu: // start for loop "to" if (mem[cpu.sp + 1] > mem[cpu.sp]) cpu.pc = mem[cpu.pc]; // goto exit else { mem[mem[cpu.sp + 2]] = mem[cpu.sp + 1]; cpu.pc++; // assign to control } break; case PVM.sfd: // start for loop "downto" // goto exit if (mem[cpu.sp + 1] < mem[cpu.sp]) cpu.pc = mem[cpu.pc]; else { mem[mem[cpu.sp + 2]] = mem[cpu.sp + 1]; cpu.pc++; // assign to control } break; case PVM.efu: // end for loop "to" if (mem[mem[cpu.sp + 2]] == mem[cpu.sp]) cpu.pc++; // loop complete else { mem[mem[cpu.sp + 2]]++; cpu.pc = mem[cpu.pc]; // increment control } break; case PVM.efd: // end for loop "downto" if (mem[mem[cpu.sp + 2]] == mem[cpu.sp]) cpu.pc++; // loop complete else { mem[mem[cpu.sp + 2]]--; cpu.pc = mem[cpu.pc]; // decrement control } break;
Notes
for i = 35 to 64 write(i); // loop to be executed 30 times ----------------------------------------------------------------- ... | ... | ... | ... | 64 | 35 | adr i | .... -----------------------------------------------------------------
for i = initial to final statement
which generates code
LDA i initial final SFU L3 L1 statement L2 EFU L1 L3 DSP -3
ForStatement = "for" Ident "=" Expression ("to" | "downto") Expression "do" Statement .
This exercise called on you to extend Parva to adopt an idea used in Pascal, where a statement like
write(X : 5, X + A : 12, X - Y : 2 * N);
will write the values of X, X+A and X-Y in fields of widths 5, 12 and 2*N respectively. This is easily handled by modifying the production for WriteElement:
WriteElement (. int expType, formType; boolean formatted = false; String str; .) = StringConst<out str> (. CodeGen.writeStr(str); .) | Expression<out expType> (. if (!(isArith(expType) || expType == Entry.boolType)) SemError("cannot write this type"); .) * [ ":" Expression<out formType> (. if (formType != Entry.intType) * SemError("fieldwidth must be integral"); * formatted = true; .) * ] (. switch (expType) { case Entry.intType: case Entry.boolType: case Entry.charType: * CodeGen.write(expType, formatted); break; default: break; } .) .
and modifying the code generation routine
public static void write(int type, boolean formatted) { // Generates code to output value of specified type from top of stack * if (formatted) * switch (type) { * case Entry.intType: emit(PVM.prnfi); break; * case Entry.boolType: emit(PVM.prnfb); break; * case Entry.charType: emit(PVM.prnfc); break; * } else switch (type) { case Entry.intType: emit(PVM.prni); break; case Entry.boolType: emit(PVM.prnb); break; case Entry.charType: emit(PVM.prnc); break; } }
and new opcodes whose interpretation is
case PVM.prnfi: // integer output formatted if (tracing) results.write(padding); fieldWidth = pop(); results.write(pop(), fieldWidth); if (tracing) results.writeLine(); break; case PVM.prnfb: // boolean output formatted if (tracing) results.write(padding); fieldWidth = pop(); if (pop() != 0) results.write(" true ", fieldWidth); else results.write(" false ", fieldWidth); if (tracing) results.writeLine(); break; case PVM.prnfc: // character output formatted if (tracing) results.write(padding); fieldWidth = pop(); results.write((char) (Math.abs(pop()) % (maxChar + 1)), fieldWidth); if (tracing) results.writeLine(); break;
The exercise suggested an implementation of a SwitchStatement described by the productions:
SwitchStatement = "switch" "(" Expression ")" "{" { CaseLabelList Statement { Statement } } [ "default" ":" { Statement } ] "}" . CaseLabelList = CaseLabel { CaseLabel } . CaseLabel = "case" [ "+" | "-" ] Constant ":" .
as exemplified by
switch (i + j) { case 2 : if (i === j) break; write("i = " , i); read(i, j); case 4 : write("four"); i = 12; case 6 : write("six"); case -9 : case 9 : case -10 : case 10 : write("plus or minus nine or ten"); i = 12; default : write("not 2, 4, 6, 9 or 10"); }
by generating code matching an equivalent set of IfStatements, effectively on the lines of
temp = i + j; if (temp == 2) { if (i === j) goto out; write("i = " , i); read(i, j); goto out; } elsif (temp == 4) { write("four"); i = 12; goto out; } elsif (temp == 6) { write("six"); goto out; } elsif (temp in (-9, 9, -10, 10)) { write("plus or minus nine or ten"); i = 12; goto out; } else write("not 2, 4, 6, 9 or 10"); out: ...
The temp value needed can be stored on the stack - if we can duplicate the value on the top of the stack before each successive comparison or test for list membership is effected, we can ensure that the value of the selector is preserved, in readiness for the next comparison. This calls for yet another simple opcode for the PVM, which can be generated by calling:
public static void duplicate() { // Generates code to push another copy of top of stack emit(PVM.dup); }
and whose interpretation is as follows:
case PVM.dup: // duplicate top of stack cpu.sp--; if (inBounds(cpu.sp)) mem[cpu.sp] = mem[cpu.sp + 1]; break;
Although this idea does not lead to a highly efficient implementation of the SwitchStatement, it is relatively easy to implement - the complexity arising from the need, as usual, to impose semantic checks that all labels are unique, that the type of the selector is compatible with the type of each label, and from a desire to keep the number of branching operations as low as possible. To effect the test for membership of a list of labels like the - 9, 9, -10 and 10 of the examples we play the same game again - another code generation routine:
public static void membership(int count, int type) { // Generates code to check membership of a list of count expressions if (count == 1) comparison(CodeGen.ceq, type); else { emit(PVM.memb); emit(count); } }
with a two word opcode interpreted as follows:
case PVM.memb: // membership test boolean isMember = false; loop = next(); int test = mem[cpu.sp + loop]; for (int m = 0; m < loop; m++) if (pop() == test) isMember = true; mem[cpu.sp] = isMember ? 1 : 0; break;
The code for the parsing routine follows:
SwitchStatement<StackFrame frame> (. int expType, labelCount; boolean branchNeeded = false; Label nextSwitch = new Label(!known); Label switchExit = new Label(!known); ArrayList<Integer> labelList = new ArrayList<Integer>(); .) = "switch" "(" Expression<out expType> (. if (isRef(expType) || expType == Entry.noType) SemError("invalid selector type"); .) ")" "{" { (. if (branchNeeded) CodeGen.branch(switchExit); branchNeeded = true; nextSwitch.here(); nextSwitch = new Label(!known); CodeGen.duplicate(); .) CaseLabelList<out labelCount, expType, labelList> (. CodeGen.membership(labelCount, expType); CodeGen.branchFalse(nextSwitch); .) Statement<frame, switchExit> { Statement<frame, switchExit> } } ( "default" ":" (. if (branchNeeded) CodeGen.branch(switchExit); nextSwitch.here(); .) { Statement<frame, switchExit> } | (. nextSwitch.here(); .) ) "}" (. switchExit.here(); CodeGen.pop(1); .) . CaseLabelList<. out int labelCount, int expType, ArrayList<Integer> labelList .> = CaseLabel<expType, labelList> (. labelCount = 1; .) { CaseLabel<expType, labelList> (. labelCount++; .) } . CaseLabel<. int expType, ArrayList<Integer> labelList .> (. ConstRec con; int factor = 1; boolean signed = false; .) = "case" [ ( "+" | "-" (. factor = -1; .) ) (. signed = true; .) ] Constant<out con> ":" (. if (!compatible(con.type, expType) || signed && con.type != Entry.intType) SemError("invalid label type"); int lab = factor * con.value; if (labelList.contains(lab)) SemError("duplicated case label"); else labelList.add(lab); CodeGen.loadConstant(lab); .) .
Notes
switch (selector) { case 20: case 30: case 40: statement 1; case 50: statement 2; default: statement 3; }
which gives rise to code like
selector DUP LDC 20 LDC 30 LDC 40 MEMB 3 BZE next statement 1 BRN exit next DUP LDC 50 CEQ BZE default statement 2 BRN exit default statement 3 exit DSP -1 ...