Tutorial 11 - Solutions

Computer Science 301 - Translators

Tutorial 11 - Code generation for a "for" loop - Solutions

Many languages provide for a ForStatement in one or other form. Although most people are familiar with these, their semantics and implementation can actually be quite tricky.

Suppose we were to add a simple Pascal-style for loop to Parva, to allow statements with concrete syntax defined by

       ForStatement = "for" Designator "=" Expression ("to" | "downto") Expression "do" Statement .

What would you suggest in the way of constraints, how would you enforce the constraints, and how would you generate code for these sorts of statements?

The static semantic constraints are easily specified:

The Designator must correspond to a variable (not a constant or function)
The type of the Designator must be boolType or intType
The type of each of the Expressions must be compatible with the type of the Designator.

The dynamic semantics at first sight look easy. The ForStatement is often explained to beginners as being a shorthand form of writing a while loop. For example, the statements

        for i = 1 to 10 do write(i); 
        for i = 12 downto 4 do write(i);

seem to be equivalent to

         i = 1;                            i = 12;
         while (i <= 10) {                 while (i >= 4) {
           write(i);                         write(i);
           i = i + 1;                        i = i - 1;
         }                                 }

but at this point the language lawyers will point out that one interpretation of the ForStatement should suggest that when the loop terminates, the control variable should be left with the final value specified in the statement (10 in the first example, 4 in the second example) whereas the code above will leave it with different values (11 in the first example, 3 in the second example).

Language lawyers will then go on to ask questions like "what about ForStatements like these?"

        for i = 10 to 1 do write(i); 
        for i = 4 downto 12 do write(i);

and, when it is pointed out that these seem to be handled by the same sort of equivalences

         i = 10;                           i = 4;
         while (i <= 1) {                  while (i >= 12) {
           write(i);                         write(i);
           i = i + 1;                        i = i - 1;
         }                                 }

which (correctly) never execute the "body" of the loop, the lawyers go on to point out that "in that case perhaps the control variable should not be left with the initial value specified in the ForStatement, and maybe should not have been altered at all, or should have been left with the final value specified in the ForStatement, and isn't it about time you got all this consistent?"

This is the point when the folk writing the language specification usually try to outwit the lawyers by stating that "after the loop has finished executing, the control variable is left with an undefined value", which is what happens in the case of languages like Pascal and Modula-2. In C-like languages the for loop is defined differently, of course, and you might like to pursue these sorts of questions further in that context if you are feeling strong.

Worse is still to come. Even if we sidestep the issue of leaving the control variable with a consistently predictable final value, consider an example like

        for i = i + 4 to i + 10 do write(i);

Here the obvious equivalent code leads to an potentially infinite loop

         i = i + 4;
         while (i <= i + 10) {
           write(i);
           i = i + 1;
         }

In principle, of course, the condition i <= i + 10 would now always be true, although the loop would probably terminate when the value assigned to i overflowed the capacity of an integer.

All this is starting to look very unsatisfactory. One more example may horrify the reader. Consider

        for i = 0 to maxInteger do write(i);

where maxInteger represents the upper limit for values of the integer type (32767 for 16-bit compilers, for example). This surely should be well-behaved? Indeed, but code like this

         i = 0;
         while (i <= maxInteger) {
           write(i);
           i = i + 1;
         }

cannot work, as the last assignment made to i will overflow, and the loop will not terminate.

Many of these issues can be overcome by a fairly dramatic process. We arrange that the code generation for the generic loops

        for Variable = Expression1 to Expression2 do Statement 
        for Variable = Expression1 downto Expression2 do Statement

must yield lower level code of the form

         Temp1 := Expression1                             Temp1 := Expression1
         Temp2 := Expression2                             Temp2 := Expression2
         IF Temp1 > Temp2 THEN GOTO EXIT                  IF Temp1 < Temp2 THEN GOTO EXIT
         Variable := Temp1;                               Variable := Temp1;
  BODY:  Statement                                 BODY:  Statement
         IF Variable = Temp2 THEN GOTO EXIT               IF Variable = Temp2 THEN GOTO EXIT
         Variable := Variable + 1                         Variable := Variable - 1
         GOTO BODY                                        GOTO BODY
  EXIT:                                            EXIT:

respectively, where Temp1 and Temp2 are temporary variables. This code will not assign a value to the control variable at all if the loop body is not executed, and will leave the control variable with the "obvious" final value if the loop body is executed. It may appear to be a little awkward for an incremental compiler, since there are now multiple apparent references to extra variables and to the control variable.

The simplest way of handling all these issues in the PVM system is to note that the obvious calls to the parsing routines

           Designator
           Expression
           Expression

will ensure that code to push the address of the control variable and the values of the temporary variables will have been generated by the time the keyword do is encountered. At this point code must be generated for the initial test, and if we avail ourselves of the ability to define extensions to the opcode set of the PVM we can generate a special opcode at this point that will leave these three values on the stack so that they can be used again later. Similarly, after generating the code for the Statement we can generate a second special opcode that can be responsible for the final test and possible increment of the control variable. One last complication is that once the for loop has completed execution we have to discard these three elements from the stack, which suggests the introduction of yet another special opcode.

The production for ForStatement is now attributed as follows:

  ForStatement<StackFrame frame>        (. Label loopExit = new Label(!known);
                                           DesType des;
                                           boolean up = true;
                                           int expType; .)
  =  "for" Designator<out des>          (. if (IsRef(des.type) || des.entry.kind != Entry.Var)
                                             SemError("illegal control variable"); .)
     "=" Expression<out expType>        (. if (!compatible(expType, des.type))
                                             SemError("incompatible with control variable"); .)
     ( "to" | "downto"                  (. up = false; .)
     )
     Expression< out expType>           (. if (!compatible(expType, des.type))
                                             SemError("incompatible with control variable");
                                           CodeGen.startForLoop(up, loopExit);
                                           Label startLoop = new Label(known); .)
     "do" Statement<frame>              (. CodeGen.endForLoop(up, startLoop);
                                           loopExit.here();
                                           CodeGen.pop3(); .)
  .

which is relatively straightforward. So is the code generation:

  public static void startForLoop(boolean up, Label destination) {
  // Generates prologue test for a for loop (either up or down)
    if (up) emit(PVM.sfu); else emit(PVM.sfd);
    emit(destination.address());
  }

  public static void endForLoop(boolean up, Label destination) {
  // Generates epilogue test and increment/decrement for a for loop (either up or down)
    if (up) emit(PVM.efu); else emit(PVM.efd);
    emit(destination.address());
  }

  public static void pop3() {
  // Generates code to discard top three elements from the stack
    emit(PVM.pop3);
  }

The nitty-gritty is contained in the extensions to the interpreter, which might require careful study:

     case PVM.sfu:           // start for loop "to"
       if (mem[cpu.sp + 1] > mem[cpu.sp]) cpu.pc = mem[cpu.pc];
       else {
         mem[mem[cpu.sp + 2]] = mem[cpu.sp + 1]; cpu.pc++;
       }
       break;
     case PVM.sfd:           // start for loop "downto"
       if (mem[cpu.sp + 1] < mem[cpu.sp]) cpu.pc = mem[cpu.pc];
       else {
         mem[mem[cpu.sp + 2]] = mem[cpu.sp + 1]; cpu.pc++;
       }
       break;
     case PVM.efu:           // end for loop "to"
       if (mem[mem[cpu.sp + 2]] == mem[cpu.sp]) cpu.pc++;
       else {
         mem[mem[cpu.sp + 2]]++; cpu.pc = mem[cpu.pc];
       }
       break;
     case PVM.efd:           // end for loop "downto"
       if (mem[mem[cpu.sp + 2]] == mem[cpu.sp]) cpu.pc++;
       else {
         mem[mem[cpu.sp + 2]]--; cpu.pc = mem[cpu.pc];
       }
       break;
     case PVM.pop3:          // discard 3 elements from top of stack
       cpu.sp += 3;
       break;

Our discussion of the ForStatement is not yet over. Consider the following examples

       for i = 1 to 5 do read(i);            // should we allow this?
       for i = 1 to 5 do i = i + 1;          // should we allow this?
       for i = 1 to 5 do                     // should we allow this - nesting
         for i = 1 to 5 do write(i);         // the loops will cause trouble

These display what is known as "threatening" behaviour. The Statement that forms the body of the loop must be able to access the value of the for loop control variable, but it can be argued that it should not be allowed to assign a new value to that control variable, as this will undermine the system and lead to loops that run for erratic numbers of iterations.

There are various ways of trying to handle this problem, which in the end result is a very difficult one. One device that will handle the examples above is to mark the control variable in the symbol table at the start of parsing the loop body with a new attribute cannotChange, and then to test this attribute in statements like ReadStatement, AssignmentStatement and ForStatement (the control variable is "unmarked" again when the Statement parser has handled the loop body, of course). The details of this are left as a fairly simple but entertaining exercise.

But the problem is harder than can be handled in this way. Consider the following silly multi-function example:

    int global;

    void threaten() {
      global = 1;
    }

    void main () {
      for global = 1 to 10 threaten();
    }

To handle this we might impose a further restriction - namely that only locally declared variables can be used to control for loops. It also helps if the Designator is restricted to being a simple variable, rather than, for example, being an object, or an array element. But in larger languages more and more of these attacks can be cunningly devised. When I was on the international committee that standardized Modula-2 we spent hours trying to think of all of these attacks, producing a huge impressive list of restrictions, but were unable to find ways of beating all the attacks.

If the body of the loop can contain a BreakStatement or ContinueStatement the production for ForStatement needs extending further.