Prac 21 Solutions

Computer Science 3 - 2001

Programming Language Translation

Practical for week 21 beginning 17 September 2001 - Solutions

Complete solutions (sources) can be found on the course WWW pages in the files PRAC21A.ZIP.

The solutions below just outline the important points:

Task 3 - debugging pragma.

Most people got this correct.

PRAGMAS
  DebugOn    = "$D+" .          (. Report->debugging = true; .)
  DebugOff   = "$D-" .          (. Report->debugging = false; .)

We use the Report interface, as this allows one to access this mode in all modules of the system, including the main program (which meant changing CS301.FRM).

The pragma can be tested in various places, for example:

  Block
  =                             (. int framesize = 0; .)
     SYNC { (  ConstDeclarations | VarDeclarations<framesize> )
     SYNC }                     (. /* reserve space for variables */
                                   CGen->openstackframe(framesize); .)
     "BEGIN" StatementSequence  (. CGen->leaveprogram();
                                   if (Report->debugging) Table->printtable(stdout); .)
     "END" .

  Statement
  =  SYNC [   Assignment
            | IfStatement      | WhileStatement
            | ReadStatement    | WriteStatement
            | ReturnStatement  | CaseStatement
            | "STACKDUMP"       (. if (Report->debugging) CGen->dump(); .)

Task 4 - Range checking

Again we have a suitable pragma

PRAGMAS
  RangeOn    = "$R+" .          (. rangeChecking = true; .)
  RangeOff   = "$R-" .          (. rangeChecking = false; .)

Here rangeChecking is local to the parser. This is then tested within Designator; not everyone got this correct. There is no need to push the size of array onto the stack if it is not to be used. To do so only means the stack gets corrupted at run time!

     ( "["                      (. if (!found) { entry.v.scalar = false; Table->enter(entry); }
                                   if (entry.v.scalar) SemError(204); .)
       Expression<exptype>      (. if (!arithtypes.memb(exptype)) SemError(227);
                                   if (rangeChecking) {
                                   /* determine size for bounds check */
                                     CGen->stackconstant(entry.v.size);
                                     CGen->subscript();
                                   }
                                   else CGen->index(); .)
       "]"
       |                        (. if (!found) Table->enter(entry);
                                   if (!entry.v.scalar) SemError(205); .)
     ) .

Task 5 - You had better do this one or else ...

This could be done as below. You must make sure that code is correctly generated whether or not the ELSE part appears (many beginners produce solutions that are correct only if it is present).

  IfStatement
  =                             (. CGEN_labels testlabel, outlabel; .)
     "IF" Condition "THEN"      (. CGen->jumponfalse(testlabel, CGen->undefined); .)
     StatementSequence
     (   "ELSE"                 (. CGen->jump(outlabel, CGen->undefined);
                                   CGen->backpatch(testlabel); .)
         StatementSequence      (. CGen->backpatch(outlabel); .)
       | /* no else part */     (. CGen->backpatch(testlabel); .)
     )
     "END" .

Task 6 - Constant Expressions

This was very badly done; most submissions still had errors. In discussions with students it had been suggested that you should produce a set of routines for parsing expressions that did not generate code, but rather evaluated the expressions, whose factors should all have been checked for being constants, and whose other components should have been checked for type consistency. There was no need to restrict such expressions to being arithmetic only. The "constant expression" parser would have been called from three places - in the declaration parser of OneConst, in the parser for UpperBound, and in the parser for Range in handling CASE statements. Of course, besides returning a value, such expression parsers need to return a type as before. To handle OneConst and UpperBound we might proceed as follows

  OneConst
  =                             (. TABLE_entries entry; .)
     Ident<entry.name>          (. entry.idclass = TABLE_consts; .)
     WEAK "="
     CExpression<entry.type, entry.c.value>
     ";"                        (. Table->enter(entry); .) .

  UpperBound<int &size>
  =                             (. TABLE_types type; .)
    "[" CExpression<type, size> (. if (!arithtypes.memb(type) || size < 0) SemError(226);
                                   size++; .)
    "]" .

The parsing routines for expressions themselves would look like this:

  CExpression<TABLE_types &e, int &value>
  =                             (. TABLE_types a;
                                   int aval; .)
     CAndExp<e, value>
     { "OR"
       CAndExp<a, aval>         (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                     { SemError(218); e = TABLE_none; }
                                   value = value || aval; .)
     } .

  CAndExp<TABLE_types &a, int &value>
  =                             (. TABLE_types e;
                                   int eval; .)
     CRelExp<a, value>
     { "AND"
       CRelExp<e, eval>         (. if (!(booltypes.memb(a) && booltypes.memb(e)))
                                     { SemError(218); a = TABLE_none; }
                                   value = value && eval; .)
     } .

  CRelExp<TABLE_types &r, int &value>
  =                             (. TABLE_types a;
                                   int aval;
                                   CGEN_operators op; .)
     CAddExp<r, value>
     [ RelOp<op>
       CAddExp<a, aval>         (. if (r == TABLE_bools || a == TABLE_bools) SemError(218);
                                   r = TABLE_bools;
                                   switch (op) {
                                     case CGEN_opeql : value = value == aval; break;
                                     case CGEN_opneq : value = value != aval; break;
                                     case CGEN_oplss : value = value <  aval; break;
                                     case CGEN_opleq : value = value <= aval; break;
                                     case CGEN_opgtr : value = value >  aval; break;
                                     case CGEN_opgeq : value = value >= aval; break;
                                     default : break;
                                   } .)
     ] .

  CAddExp<TABLE_types &a, int &value>
  =                             (. TABLE_types m;
                                   int mval;
                                   CGEN_operators op; .)
     CMultExp<a, value>
     { AddOp<op>
       CMultExp<m, mval>        (. if (!(arithtypes.memb(a) && arithtypes.memb(m)))
                                     { SemError(218); a = TABLE_none; }
                                   else switch (op) {
                                     case CGEN_opadd : value += mval; break;
                                     case CGEN_opsub : value -= mval; break;
                                     default : break;
                                   } .)
     } .

  CMultExp<TABLE_types &m, int &value>
  =                             (. TABLE_types u;
                                   int uval;
                                   CGEN_operators op; .)
     CUnaryExp<m, value>
     { MulOp<op>
       CUnaryExp<u, uval>       (. if (!(arithtypes.memb(m) && arithtypes.memb(u)))
                                     { SemError(218); m = TABLE_none; }
                                   else switch (op) {
                                     case CGEN_opmul :
                                       value *= uval; break;
                                     case CGEN_opmod :
                                       if (uval != 0) value %= uval; else SemError(224);
                                       break;
                                     case CGEN_opdvd :
                                       if (uval != 0) value /= uval; else SemError(224);
                                       break;
                                     default : break;
                                   } .)
     } .

  CUnaryExp<TABLE_types &u, int &value>
  =    CFactor<u, value>
     | "+" CUnaryExp<u, value>  (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; } .)
     | "-" CUnaryExp<u, value>  (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; }
                                     value = - value; .)
     | "NOT" CUnaryExp<u, value> (. if (!booltypes.memb(u)) SemError(218);
                                     value = ! value;
                                     u = TABLE_bools; .) .

  CFactor<TABLE_types &f, int &value>
  =                             (. TABLE_entries entry;
                                   TABLE_alfa name;
                                   bool found; .)
       Ident<name>              (. Table->search(name, entry, found);
                                   if (!found) SemError(202);
                                   if (entry.idclass != TABLE_consts) {
                                     SemError(206); value = 1; // play safe
                                   } else value = entry.c.value;
                                   f = entry.type; .)
     | Number<value>            (. f = TABLE_ints; .)
     | "TRUE"                   (. f = TABLE_bools; value = 1; .)
     | "FALSE"                  (. f = TABLE_bools; value = 0; .)
     | "(" CExpression<f, value> ")" .

There is, of course, another way of doing this, and in retrospect, having seen what heavy weather people made of my suggestions, it might have been better to follow this other approach. Here we would have but one set of parsing routines, which would be called with an extra value parameter denoting whether we required the code to be generated for later execution or not, and with an extra reference parameter returning the value of the expression as one "evaluated' it at compile time (if variables are encountered in the Factor parser one just returns a "safe' value instead). With this approach one might proceed as follows

At the tope of the grammar file define two useful constants

  #define CONSTANT false
  #define CODE true

The OneConst parser would then become

  OneConst
  =                             (. TABLE_entries entry; .)
     Ident<entry.name>          (. entry.idclass = TABLE_consts; .)
     WEAK "="
     Expression<entry.type, entry.c.value, CONSTANT>
     ";"                        (. Table->enter(entry); .) .

And the Condition parser, for example, would become:

  Condition
  =                             (. TABLE_types exptype;
                                   int value; .)
     Expression<exptype, value, CODE>
                                (. if (!booltypes.memb(exptype)) SemError(221) .) .

It will suffice to illustrate only one or two of the expression parsers:

  Expression<TABLE_types &e, int &value, bool code>
  =                             (. TABLE_types a;
                                   int aval;
                                   CGEN_labels shortcircuit; .)
     AndExp<e, value, code>
     { "OR"
        AndExp<a, aval, code>    (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                     { SemError(218); e = TABLE_none; }
                                   if (code) CGen->binarybooleanop(CGEN_orrop); .)
                                   value = value || aval;
     } .

  AddExp<TABLE_types &a, int &value, bool code>
  =                             (. TABLE_types m;
                                   int mval;
                                   CGEN_operators op; .)
     MultExp<a, value, code>
     { AddOp<op> MultExp<m, mval, code>
                                (. if (!(arithtypes.memb(a) && arithtypes.memb(m)))
                                     { SemError(218); a = TABLE_none; }
                                   if (code) CGen ->binaryintegerop(op);
                                   switch (op) {
                                     case CGEN_opadd : value += mval; break;
                                     case CGEN_opsub : value -= mval; break;
                                     default : break;
                                   } .)
     } .

The greatest changes come about in Factor:

  Factor<TABLE_types &f, int &value, bool code>
  =                             (. TABLE_entries entry;
                                   classset allowed;
                                   if (code) allowed = classset(TABLE_consts, TABLE_vars);
                                   else allowed = classset(TABLE_consts);
                                   value = 1; // play safe! .)
       Designator<allowed, entry>
                                (. f = entry.type;
                                   if (code) {
                                     switch (entry.idclass)
                                     { case TABLE_vars :
                                         CGen->dereference(); break;
                                       case TABLE_consts :
                                         CGen->stackconstant(entry.c.value); break;
                                     }
                                   }
                                   else {
                                     if (entry.idclass != TABLE_consts) SemError(206);
                                     else value = entry.c.value;
                                   } .)
     | Number<value>            (. if (code) CGen->stackconstant(value); f = TABLE_ints; .)
     | "TRUE"                   (. value = 1; if (code) CGen->stackconstant(1); f = TABLE_bools .)
     | "FALSE"                  (. value = 0; if (code) CGen->stackconstant(0); f = TABLE_bools .)
     | "(" Expression<f, value, code> ")" .

In the solution kit (PRAC21A.ZIP) you can see two attributed grammars - one using the double set of parser routines (SOL1.ATG) and the other using this combined approach (CS301.ATG).

Task 7 - better control over output

Again, a mixture of attempts was received. The best way to do this is to redefine the way the STKMC_prn and STKMC_prb opcodes work, so that they expect to find the width parameter on the stack, as well as the value to be printed. Then we can proceed as follows (this illustrates the code that matches the "double set of expressions" approach.

  WriteElement
  =                             (. TABLE_types exptype, formtype;
                                   char str[600];
                                   CGEN_labels startstring; .)
      String<str>               (. CGen->stackstring(str, startstring);
                                   CGen->writestring(startstring); .)
    | Expression<exptype>
      ( ":"
          Expression<formtype>  (. if (formtype != TABLE_ints) SemError(218); .)
        |                       (. CGen->stackconstant(6) /* default */ .)
      )                         (. switch (exptype)
                                   { case TABLE_ints  : CGen->writevalue(); break;
                                     case TABLE_bools : CGen->writeboolean(); break;
                                     case TABLE_none  : /* error already */    break;
                                   } .) .

with the extended machine operations reading

        case STKMC_prn:
          if (tracing) fputs(BLANKS, results);
          cpu.sp +=2;
          if (inbounds(cpu.sp))
            fprintf(results, "%*d", mem[cpu.sp - 2], mem[cpu.sp - 1]);
          if (tracing) putc('\n', results);
          break;
        case STKMC_prb:
          if (tracing) fputs(BLANKS, results);
          cpu.sp +=2;
          if (inbounds(cpu.sp))
          { if (mem[cpu.sp - 1] != 0)
              fprintf(results, "%*s", mem[cpu.sp - 2], "TRUE");
            else fprintf(results, "%*s", mem[cpu.sp - 2], "FALSE");
          }
          if (tracing) putc('\n', results);
          break;

Many people got quite close to this, but forgot to pop two values off the stack. I really cannot believe that they tested their compilers out at all, as this would quickly have shown up, or given memory violation errors!

Note that Coco/R has a very useful feature, which some of you have not appreciated. While it is usual to write an optional part in [ brackets ], sometimes one want to be able to attach an action to the situation where the option is absent. Coco/R allows this as follows

          Something = (   Present   (. ActionIfPresent .)
                        |           (. ActionIfAbsent .)
                      ) .

Task 8 - Case statements

This was probably the task that needed most thought, and many people had not given it enough, in spite of my giving fairly explicit suggestions, and having quite extensive further discussions with several groups. Please make a point of studying the code below in detail, and if you still do not understand, then come to ask. Once again, this solution is illustrated in terms of the "double set of expressions" grammar.

One subtlety that virtually everyone missed is that OneCase is nullable. That means that one only generates the implicit "break" branch instructions for each OneCase that actually results in a StatementSequence being parsed. Many people tried code like this

  CaseStatement
  =                             (. CGEN_labels startcase, endcase, dummy; .)
    "CASE"
      Expression
    "OF"                        (. CGen->jump(startcase, CGen->undefined); .)
      OneCase                   (. CGen->jump(endcase, CGen->undefined); .)
      { "|" OneCase             (. CGen->jump(dummy, endcase); .)
      }
    "DEFAULT" ":"
      StatementSequence         (. CGen->jump(dummy, endcase);
                                   CGen->backpatch(startcase);
                                   CGen->fixcase(...);
                                   CGen->backpatch(endcase);
                                .)
    "END" .

which, while quite cute and getting over the multiple backpatch problem, would not have generated very nice code if the OneCase productions had not generated any code.

Here is what the OneCase production might look like. We need to pass the exptype returned by Expression into each of the OneCase calls to allow type checking to take place. We need these calls to return us the minimum and maximum values of the source labels themselves, and the number of these labels, so that we can generate a reasonable way of "switching" when the time comes. We also need a LabelList for each CASE statement, and this has to be passed to the subparsers. Finally, we need to be able to set up the chain of implicit breakpoints, and so the breaklabel parameter must be passed down to OneCase, which modifies it if necessary.

  CaseStatement
  =                             (. TABLE_types exptype;
                                   CGEN_labels startcase, defaultlabel;
                                   CGEN_labels breaklabel = CGen->undefined;
                                   int min = 0, max = 0, nlabels = 0;
                                   LabelList L; .)
    "CASE"
      Expression<exptype>
    "OF"                        (. CGen->jump(startcase, CGen->undefined); .)
      OneCase<L, breaklabel, exptype, min, max, nlabels>
      { "|" OneCase<L, breaklabel, exptype, min, max, nlabels>
      }
    "DEFAULT" ":"               (. CGen->storelabel(defaultlabel); .)
      StatementSequence         (. CGen->jump(breaklabel, breaklabel);
                                   CGen->backpatch(startcase);
                                   CGen->fixcase(L, min, max, nlabels, defaultlabel);
                                   CGen->backpatchlist(breaklabel);
                                .)
    "END" .

You may well be puzzled at the way in which OneCase generates the implicit break statements. The CGen->jump(breaklabel, breaklabel) probably looks ridiculous. What it does is to set up a sequence of these break statements that each points back to the previous one.

  OneCase<LabelList &L, CGEN_labels &breaklabel,
          TABLE_types exptype, int &min, int &max, int &nlabels>
  = [
      Range<L, exptype, min, max, nlabels>
      { WEAK "," Range<L, exptype, min, max, nlabels>
      }
      ":"
      StatementSequence         (. CGen->jump(breaklabel, breaklabel); .)
    ]
    .

This sequence is then "walked" at compile time by the new code generating function backpatchlist at the end of parsing the CASE statement itself; the code below showing how each of the break statements is cleverly converted into a forward branch to the common exit point. This is the technique suggested in the text in Exercise 15.8 (where it is applied to the problem of multiple exits from a loop). It could be useful in other situations - such as handling "goto" statements if we were to introduce those to our language.

  void CGEN::backpatchlist(CGEN_labels location)
  { CGEN_labels nextlabel;
    while (location) {
      nextlabel = Machine->mem[location+1];
      Machine->mem[location+1] = codetop;
      location = nextlabel;
    }
  }

The Range parser is responsible for several things - it must set up the list of source label/object label pairs in LabelList L, determine the minimum and maximum values encountered as source labels, and check that the source labels are all of the correct type. Notwithstanding this, it is fairly straightforward:

  Range<LabelList &L, TABLE_types exptype, int &min, int &max, int &nlabels>
  =                             (. int low;
                                   caseLabel oneCase;
                                   TABLE_types type; .)

    CExpression<type, low>      (. int high = low;
                                   if (nlabels == 0 || low < min) min = low;
                                   if (nlabels == 0 || low > max) max = low;
                                   if (exptype != type) SemError(218); .)
    [ ".."
    CExpression<type, high>     (. if (exptype != type) SemError(218);
                                   if (high > max) max = high; .)
    ]                           (. if (high < low) SemError(227);
                                   else for (int opt = low; opt <= high; opt++) {
                                     oneCase.opt = opt;
                                     CGen->storelabel(oneCase.lab);
                                     nlabels++;
                                     if (L.isMember(oneCase)) SemError(228);
                                     else L.add(oneCase);
                                   }
                                .) .

Of course, at the end of the CASE statement parser we have the fun task of setting up the switching mechanism itself. The solution below illustrates how one might choose between the run-time efficient SEL idea suggested in the tutorial (simply indexing into a list of addresses, but one which might have many instances of the "default" label taking up space) or the less run-time efficient SWI idea, where at run time one has to do a linear search for a match to the run-time value of the selector expression.

  void CGEN::fixcase(LabelList L, int min, int max, int nlabels, CGEN_labels defaultlabel)
  { caseLabel oneCase;
    if (max - min > 4 * nlabels) {
      emit(int(STKMC_swi)); emit(nlabels);
      while (L.length() > 0) {
        L.remove(0, oneCase);
        emit(oneCase.opt); emit(oneCase.lab);
      }
      emit(int(defaultlabel));
    }
    else {
      emit(int(STKMC_sel)); emit(min); emit(max); emit(defaultlabel);
      int * labels = new int[max - min + 1];
      for (int k = 0; k < max - min + 1; k++) labels[k] = defaultlabel;
      while (L.length() > 0) {
        L.remove(0, oneCase);
        labels[oneCase.opt - min] = oneCase.lab;
      }
      for (int k = 0; k < max - min + 1; k++) emit(labels[k]);
      delete [] labels;
    }
  }

The interpreter routines look like this:

        case STKMC_swi:
          cpu.sp++;
          if (inbounds(cpu.sp))
          { int ii = cpu.pc + 1;
            int cc = 1;
            while (mem[cpu.sp-1] != mem[ii] && cc <= mem[cpu.pc]) {
              ii += 2; cc++;
            }
            if (cc <= mem[cpu.pc]) cpu.pc = mem[ii+1]; else cpu.pc = mem[ii];
          }
          break;

        case STKMC_sel:
          cpu.sp++;
          if (inbounds(cpu.sp))
          { if (   mem[cpu.sp-1] < mem[cpu.pc]
                || mem[cpu.sp-1] > mem[cpu.pc+1]) cpu.pc = mem[cpu.pc+2];
            else cpu.pc = mem[cpu.pc + 3 + mem[cpu.sp-1] - mem[cpu.pc]];
          }
          break;

It is important to realise that neither of these ideas would work very well if we had CASE statements where the range of the source labels was very large - for example:

    CASE I OF                       /
           1 .. 1000 : Write('First Millenium');
      | 1001 .. 2000 : Write('Second Millenium');
      DEFAULT :
    END;

In general generating good code for CASE statements requires much more sophistication than we have illustrated. There are several articles and books showing some of the ideas, but we shall let the matter rest for the moment.

Task 9 - Short circuit Booleans

Only a few groups tried this, and none got it near correct. To do so, it is easiest to introduce new machine operations, and generate code to use them:

  Expression<TABLE_types &e>
  =                             (. TABLE_types a;
                                   CGEN_labels shortcircuit; .)
     AndExp<e>
     { "OR"                     (. if (shortBoolean) CGen->booleanop(shortcircuit, CGEN_opor) .)
       AndExp<a>                (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                     { SemError(218); e = TABLE_none; }
                                   else e = TABLE_bools;
                                   if (shortBoolean) CGen->backpatch(shortcircuit);
                                   else CGen->binarybooleanop(CGEN_orrop); .)
     } .

  AndExp<TABLE_types &a>
  =                             (. TABLE_types e;
                                   CGEN_labels shortcircuit; .)
     RelExp<a>
     { "AND"                    (. if (shortBoolean) CGen->booleanop(shortcircuit, CGEN_opand) .)
       RelExp<e>                (. if (!(booltypes.memb(a) && booltypes.memb(e)))
                                     { SemError(218); a = TABLE_none; }
                                   else a = TABLE_bools;
                                   if (shortBoolean) CGen->backpatch(shortcircuit);
                                   else CGen->binarybooleanop(CGEN_andop); .)
     } .

with the new operations

        case STKMC_ban:
          if (!mem[cpu.sp]) cpu.pc = mem[cpu.pc]; else { cpu.sp++; cpu.pc++; };
          break;
        case STKMC_bor:
          if (mem[cpu.sp]) cpu.pc = mem[cpu.pc];  else { cpu.sp++; cpu.pc++; };
          break;