Tutorial for week 26

Computer Science 301 - 2009

Tutorial for week 26 - Code generation - Solutions

(a) A BYT (Bright Young Thing) has been nibbling away at writing extensions to her first Parva compiler. It has been suggested that a function that will return the maximum element from a variable number of arguments would be a desirable addition, one that might form part of an expression as in

a = max(x, y, z) + 5 - max(a, max(c, d));

and that this could be achieved by extending the production for a Primary in a fairly obvious way, and adding a suitable opcode to the PVM. The production for Primary in the simple Parva compiler is defined as follows:

        Primary<out int type>                 (. type = Entry.noType;
                                                 int size;
                                                 DesType des;
                                                 ConstRec con; .)
        =    Designator<out des>              (. type = des.type;
                                                 switch (des.entry.kind) {
                                                   case Entry.Var:
                                                     CodeGen.dereference();
                                                     break;
                                                   case Entry.Con:
                                                     CodeGen.loadConstant(des.entry.value);
                                                     break;
                                                   default:
                                                     SemError("wrong kind of identifier");
                                                     break;
                                                 } .)
           | Constant<out con>                (. type = con.type;
                                                 CodeGen.loadConstant(con.value); .)
           | "new" BasicType<out type>        (. type++; .)
             "[" Expression<out size>         (. if (!isArith(size))
                                                   SemError("array size must be integer");
                                                 CodeGen.allocate(); .)
             "]"
           | "(" Expression<out type> ")" .

How would the Primary production and the interpreter need to be changed to support this extension?

Allow your max function to take one or more arguments, and ensure that it can only be applied to arithmetic arguments. Assume that a suitable CodeGen routine can be introduced to generate any new opcodes required.

The addition to the Primary production would consist of another case arm. Here are two possibilities:

           | "max" "(" Expression<out type>   (. if (!isArith(type))
                                                   SemError("argument must be numeric"); .)
             { "," Expression<out type>       (. if (!isArith(type))
                                                   SemError("argument must be numeric");
                                                 CodeGen.max(); .)
             } ")"

           | "MAX" "(" Expression<out type>   (. int count = 1;
                                                 if (!isArith(type))
                                                   SemError("argument must be numeric"); .)
             { "," Expression<out type>       (. count++;
                                                 if (!isArith(type))
                                                   SemError("argument must be numeric"); .)
             } ")"                            (. CodeGen.max2(count); .)

In the second case, the new opcode MAX2 N has the number of expressions that have been stacked up as its argument N. Finding the maximum then can be done by popping pairs of values and pushing back the larger. In terms of operations suggested for the opcodes given earlier this might be achieved as follows

          case PVM.max:           // max(tos, sos)
            tos = pop();
            sos = pop();
            if (tos > sos) push(tos); else push(sos);
            break;

          case PVM.max2:          // max(a,b,c....)
            int count = next();
            while (count > 1) {
              tos = pop();
              sos = pop();
              if (tos > sos) push(tos); else push(sos);
              count--;
            }
            break;

but of course other code is possible (and more efficient) if it manipulates cpu.sp directly.

(b) A BYT (Bright Young Thing) has been nibbling away at writing extensions to her first Parva compiler, while learning the Python language at the same time. She has been impressed by a Python feature which allows one to write multiple assignments into a single statement, as exemplified by

          A, B = X + Y, 3 * List[6]; 
          A, B = B, A;                // exchange A and B 
          A, B = X + Y, 3, 5;         // incorrect

which she correctly realises can be described by the context-free production

          Designator { "," Designator } "=" Expression { "," Expression } ";"

The Parva attributed grammar that she has been given deals with single, simple integer assignments only (type checking not needed in the examination):

          Assignment                       (. DesType des; .)
          =  Designator<out des>           (. if (des.entry.kind != Entry.Var)
                                                SemError("invalid assignment"); .)
             "="
             Expression                    (. CodeGen.assign(); .)
             ";" .

where the CodeGen.assign() method generates a code that is matched in the interpreter by a case arm reading

          case PVM.sto:
            //   store value at top of stack on address at second on stack
            mem[mem[cpu.sp + 1]] = mem[cpu.sp];
            //   bump stack pointer
            cpu.sp = cpu.sp + 2;
            break;

How would the Assignment production and the interpreter need to be changed to support this extension?

Essentially the idea is to generate code that will first push a set of destination addresses onto the stack, followed by a matching list of expression values. A special assignment opcode can then arrange to work through the list making the required number of assignments.

To ensure that there are as many expressions as designators it is easiest to count both, and then make a simple check when the terminating semicolon is identified. This count can be used as an argument to a new PVM.sto opcode. Devoid of type checking:

          Assignment                       (. DesType des;
                                              int desCount = 1, expCount = 1; .)
          =  Designator<out des>           (. if (des.entry.kind != Entry.Var)
                                                SemError("invalid assignment"); .)
             { "," Designator<out des>     (. if (des.entry.kind != Entry.Var)
                                                SemError("invalid assignment");
                                              desCount++; .)
             }
             "="
             Expression
             { "," Expression              (. expCount++; .)
             }                             (. if (expCount != desCount)
                                                SemError("left and right counts disagree");
                                              CodeGen.assign(desCount); .)
             ";" .

If we wish to add type checking we must check not only that the numbers of designators and expressions agree, but also that the types of the expressions are compatible with the types of the designators. This last check requires that we build up a list of the designator types - and when we examine this list later we must take care to avoid disaster in retrieving more designator types than have been added to the list! Not all the assignments have to be of the same "type".

          Assignment                       (. int expType;
                                              DesType des;
                                              int count = 0;
                                              int desCount = 1, expCount = 1;
                                              ArrayList<Integer> typeList = new ArrayList<Integer>(); .)
          =  Designator<out des>           (. if (des.entry.kind != Entry.Var)
                                                SemError("invalid assignment");
                                              typeList.add(des.type); .)
             { "," Designator<out des>     (. if (des.entry.kind != Entry.Var)
                                                SemError("invalid assignment");
                                              typeList.add(des.type);
                                              desCount++; .)
             }
             "="
             Expression<out expType>       (. if (count < typeList.size()
                                                  && !compatible(typeList.get(count++), expType))
                                                SemError("incompatible types in assignment"); .)
             { "," Expression<out expType> (. if (count < typeList.size()
                                                  && !compatible(typeList.get(count++), expType))
                                                SemError("incompatible types in assignment");
                                              expCount++; .)
             }                             (. if (expCount != desCount)
                                                SemError("left and right counts disagree");
                                              if (desCount == 1) CodeGen.assign(des.type);
                                              else CodeGen.assignN(desCount); .)
             WEAK ";" .

The new code generator routine is simple:

          public static void assignN(int n) {
          // Generates code to store n values currently on top-of-stack on the n addresses
          // below these, finally discarding 2*n elements from the stack.
            emit(PVM.ston); emit(n);
          }

and interpretation is fairly straightforward:

          case PVM.ston:          // store n values at top of stack on addresses below them on stack
            int n = next();
            for (int i = n - 1; i >= 0; i--)
              mem[mem[cpu.sp + n + i]] = mem[cpu.sp + i]; // do the assignments
                                                          // then bump stack pointer to
            cpu.sp = cpu.sp + 2 * n;                      // discard addresses and values
            break;

(c) Extend Parva to provide a relational operator for determining whether the value of an expression matched one of the values in a list, as exemplified by the following code

          exp in (a, b, c)
          itsASmallPrime = i in (2, 3, 5, 7, 11, 13);
          if (list[i] in (i, 2 * i, 3 * i)) write("list[i] is a small multiple of its index");

We add the in operator into the hierarchy at the same level as the other relation operators, though of course the syntax is somewhat different:

          RelExp<out int type>             (. int type2;
                                              int op;
                                              int count; .)
          =  AddExp<out type>
             [ RelOp<out op>
                  AddExp<out type2>        (. if (!isArith(type) || !isArith(type2))
                                                SemError("incomparable operands");
                                              type = Entry.boolType; CodeGen.comparison(op); .)
                | "in"
                  ExpList<out count, type> (. type = Entry.boolType; CodeGen.membership(count); .)
             ] .

Parsing the ExpList must include the check that all the members of the list are of a type compatible with the expression that is to be tested for membership. This routine returns the number of elements in the list, and generates code to push the values of the expressions onto the stack.

          ExpList<out int count, int type>  (. int type2;
                                               count = 1; .)
          = "("
               Expression<out type2>        (. if (!compatible(type, type2))
                                                 SemError("incompatible types"); .)
               { "," Expression<out type2>  (. if (!compatible(type, type2))
                                                 SemError("incompatible types");
                                               count++; .)
               }
            ")" .

Note that the compatibility tests are done after each Expression has been parsed, and not left until the end. A special opcode is used to test for membership. Code generation is easy:

          public static void membership(int count) {
          // Generates code to check membership of a list of count expressions
            emit(PVM.memb); emit(count);
          }

while interpretation requires that the value of each of the expressions is popped off the stack and checked to see whether it matches the value being tested for membership. Finally the last value is popped, and a pseudo-boolean value left on top of the stack:

          case PVM.memb:          // membership test
            boolean isMember = false;
            loop = next();
            int test = mem[cpu.sp + loop];
            for (int l = 0; l < loop; l++) if (pop() == test) isMember = true;
            mem[cpu.sp] = isMember ? 1 : 0;
            break;