Computer Science 3 - 2000

Programming Language Translation


Practical for Week 25-26, beginning 16 October 2000 - Solutions to Tasks 3 - 6

Full sources for these are available, as usual, in the files PRAC26CA.ZIP (C++ versions) or PRAC26PA.ZIP (Pascal versions).

Solutions received were of rther variable quality. I apologise for having inflicted this on you right at the end of the year, but we really had no choice, and it was good to see that at least some people responded very well. Disappointingly, thre was evidence that some people had submitted solutions that were so close to my own that I have to assume that there was what I'll just call "unacknowledged help". This sort of behaviour is really very easy to spot, and does your credulity no good at all!


Task 3 The $D pragma

This should have caused few problems. The simplest way is to make changes (a) to the ATG file

  Statement<int &framesize>
  =  SYNC (   Block<framesize>
            ...
            | "stackdump"       (. if (Report->debugging) CGen->dump() .)
          ) .

and (b) to the TOPSY.FRM file.

  // list generated code for interest
  CGen->getsize(codelength, initsp);
  appendextension(SourceName, ".cod", CodeName);
  if (Report->debugging) Machine->listcode(CodeName, codelength);

but note that one must always getsize, even if one does not want to listcode.


Task 4 Kindness to those who confuse the = and == operators

I'd hoped most people would see this, or have it explained to them. The trick is quite easy:

  EqualOp<CGEN_operators &op>
  =                             (. op = CGEN_opeql; .)
   (   "=="
     | "="                      (. SemError(91); .)
     | "!="                     (. op = CGEN_opneq; .)
   ) .

In my solution I took the trouble to extend the same idea to the assignment operator:

  AssignOp
  =    "="
     | ":="                     (. SemError(210); .) .

to help those people who confuse the = and := operators. Tricks like this are probably commonplace in "student oriented" compilers.


Task 5 The do-while loop

The basic idea is quite simple; the only complication is that the machine has only a "branch if false" operation, and we really need a "branch if true" operation here. However, we do have a NOT opcode, so one solution is to use this, basically as follows:

  DoStatement<int &framesize>
  =                             (. CGEN_labels startloop, dummylabel ; .)
     "do"                       (. CGen->storelabel(startloop) .)
       Statement<framesize>
     WEAK "while"
     WEAK "(" Condition
     WEAK ")" WEAK ";"
                                (. CGen->negateboolean();
                                   CGen->jumponfalse(dummylabel, startloop); .) .

Other possible solutions are

- to extend the machine in the STKMC files to provide a BNZ operation, and to extend the code generator in the CGEN files to gain access to this.

- to generate extra code at the end of the loop like this:

  DoStatement<int &framesize>
  =                             (. CGEN_labels startloop, exitpoint, dummylabel ; .)
     "do"                       (. CGen->storelabel(startloop) .)
       Statement<framesize>
     WEAK "while"
     WEAK "(" Condition
     WEAK ")" WEAK ";"
                                (. CGen->jumponfalse(exitpoint, CGen->undefined);
                                   CGen->jump(dummylabel, startloop);
                                   CGen->backpatch(exitpoint); .) .

but that is not quite as aesthetically pleasing, let alone being inefficient.

Note that syntax of the form

"do" { Statement } "while" "(" Condition ")" ";"

is non-LL(1) (I hope you see why). Some people fixed this by using

"do" "{" { Statement } "}" "while" "(" Condition ")" ";"

or equivalently

"do" Block "while" "(" Condition ")" ";"

but the syntax I used is better - there is no need to force the braces if all you want is a statement like

do cin >> i; while (i == 0);


Task 6(a) The C++ like expression grammar

Many people had a reasonable attempt at this, but some clearly did not really understand the nuances of what they were doing, and hacked away in rather ugly ways at the original code, leaving in all sorts of unnecessary tests, confusing parameters, and the like.

A "clean" solution follows. Note that this is, in some ways, simpler than the original one, as the arithmetic and boolean operators have been syntactically separated from one another. Pay particular attention to the way in which the type-checking has been performed. And please - if you are going to hack code, do it sympathetically and neatly!

In passing, the original grammar had a kludge in it to help with error recovery in the cases where operators were omitted altogether. This has to be excised from this version because of the very different way in which unary operators are used in the C++ form of expressions (where things like a + - + b are allowed!)

  Expression<TABLE_types &e>
  =                             (. TABLE_types a;
                                   CGEN_labels shortcircuit; .)
     AndExp<e>
     { "||"                     (. CGen->booleanop(shortcircuit, CGEN_opor) .)
       AndExp<a>                (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                     { SemError(218); e = TABLE_none; }
                                   else e = TABLE_bools;
                                   CGen->backpatch(shortcircuit); .)
     } .

  AndExp<TABLE_types &a>
  =                             (. TABLE_types e;
                                   CGEN_labels shortcircuit; .)
     EqualExp<a>
     { "&&"                     (. CGen->booleanop(shortcircuit, CGEN_opand) .)
       EqualExp<e>              (. if (!(booltypes.memb(a) && booltypes.memb(e)))
                                     { SemError(218); a = TABLE_none; }
                                   else a = TABLE_bools;
                                   CGen->backpatch(shortcircuit); .)
     } .


  EqualExp<TABLE_types &e>
  =                             (. TABLE_types r;
                                   CGEN_operators op; .)
     RelationExp<e>
     { EqualOp<op> RelationExp<r>
                                (. if (!compatible(e, r)) SemError(218);
                                   e = TABLE_bools; CGen->comparison(op) .)
     } .

  RelationExp<TABLE_types &r>
  =                             (. TABLE_types a;
                                   CGEN_operators op; .)
     AddExp<r>
     { RelOp<op> AddExp<a>      (. if (!compatible(r, a) || r == TABLE_bools ||
                                       a == TABLE_bools) SemError(218);
                                   r = TABLE_bools; CGen->comparison(op) .)
     } .

  AddExp<TABLE_types &a>
  =                             (. TABLE_types m;
                                   CGEN_operators op; .)
     MultExp<a>
     { AddOp<op> MultExp<m>     (. if (!(arithtypes.memb(a) && arithtypes.memb(m)))
                                     { SemError(218); a = TABLE_none; }
                                   else CGen ->binaryintegerop(op); .)
     } .

  MultExp<TABLE_types &m>
  =                             (. TABLE_types u;
                                   CGEN_operators op; .)
     UnaryExp<m>
     { MulOp<op> UnaryExp<u>    (. if (!(arithtypes.memb(m) && arithtypes.memb(u)))
                                     { SemError(218); m = TABLE_none; }
                                   else CGen ->binaryintegerop(op); .)
     } .

  UnaryExp<TABLE_types &u>
  =    Factor<u>
     | "+" UnaryExp<u>          (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; } .)
     | "-" UnaryExp<u>          (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; }
                                   else CGen->negateinteger(); .)
     | "!" UnaryExp<u>          (. if (!booltypes.memb(u)) SemError(218);
                                   else CGen->negateboolean();
                                   u = TABLE_bools; .) .

  Factor<TABLE_types &f>
  =                             (. int value;
                                   TABLE_entries entry; .)
       Designator<classset(TABLE_consts, TABLE_vars), entry, f>
                                (. switch (entry.idclass)
                                   { case TABLE_vars :
                                       if (entry.v.scalar) CGen->stackvalue(entry.v.offset);
                                       else CGen->dereference();
                                       break;
                                     case TABLE_consts :
                                       CGen->stackconstant(entry.c.value); break;
                                   } .)
     | Number<value>            (. CGen->stackconstant(value); f = TABLE_ints; .)
     | Char<value>              (. CGen->stackconstant(value); f = TABLE_chars; .)
     | "true"                   (. CGen->stackconstant(1); f = TABLE_bools .)
     | "false"                  (. CGen->stackconstant(0); f = TABLE_bools .)
     | "(" Expression<f> ")" .

  EqualOp<CGEN_operators &op>
  =                             (. op = CGEN_opeql; .)
   (   "=="
     | "="                      (. SemError(91); .)
     | "!="                     (. op = CGEN_opneq; .)
   ) .

  RelOp<CGEN_operators &op>
  =                             (. op = CGEN_oplss; .)
   (   "<"
     | "<="                     (. op = CGEN_opleq; .)
     | ">"                      (. op = CGEN_opgtr; .)
     | ">="                     (. op = CGEN_opgeq; .)
   ) .


Task 6(b) The break statement.

The best idea to build up a chain of the break statements within the code itself, in the manner described in the text book on page 224. These all appear at first to chain in the wrong way, but when the end of the loop is reached a simple walk through the chain fixes everything up. Here is how simple it really all is! (We need similar code in DoStatement and in ForStatement, of course - see the full ATG files).

  WhileStatement<int &framesize>
  =                             (. CGEN_labels startloop, oldexit, testlabel, dummylabel; .)
     "while" WEAK "("           (. CGen->storelabel(startloop); .)
     Condition WEAK ")"         (. looplevel++; oldexit = loopexit;
                                   loopexit = CGen->undefined;
                                   CGen->jumponfalse(testlabel, CGen->undefined); .)
     Statement<framesize>       (. CGen->jump(dummylabel, startloop);
                                   CGen->backpatch(testlabel);
                                   CGen->backpatchlist(loopexit);
                                   loopexit = oldexit; looplevel--; .) .

Here looplevel and loopexit are "global" variables - so that they can be seen both by WhileStatement and by BreakStatement. looplevel is used as a simple way of allowing the semantic check that one can only have the syntactically trivial break statement within the context of loops. We cannot use a simple Boolean variable for this purpose, as we need to be able to handle nested loops. The same need is that which requires us to save and restore loopexit before and after we parse the loop body statement. After this, the BreakStatement parser is trivially easy (note the clever use of the call to CGen->jump with what appears to be the same parameter in two places. Work it out for yourselves!)

  BreakStatement
  = "break"                     (. if (!looplevel) SemError(230);
                                   CGen->jump(loopexit, loopexit); .)
    WEAK ";" .

This all works in conjunction with a new routine in the code generator:

  void CGEN::backpatchlist(CGEN_labels location)
  { CGEN_labels nextlabel;
    while (location) {                       // stop when we get the sentinel
      nextlabel = Machine->mem[location+1];  // keep a copy of the link
      Machine->mem[location+1] = codetop;    // fix the branch instruction
      location = nextlabel;                  // and move to the next one
    }
  }

TOPSY1.ATG embodies another solution. Here the idea is to generate rather kludgy code, so that a while loop looks like this

         startloop   evaluate condition
                     if false, goto endloop
                     goto loopbody     // jump over next operation
         breakpoint  goto endloop      // all breaks come here
         loopbody    ....
                     goto breakpoint   // a break statement
                     ....
                     goto breakpoint   // another break statement
                     ....
                     goto startloop    // round the loop again
         endloop                       // continue after loop

The advantage of this is that there are always exactly two backpatch operations per loop. The break statements are turned into simple (but nasty) backward, rather than (nice) forward jumps. The code generation is actually even more long winded than for the elegant method. We still need the global variables; we still need to save and restore loopexit.

  WhileStatement<int &framesize>
  =                             (. CGEN_labels startloop, oldexit, testlabel, dummylabel; .)
     "while" WEAK "("           (. CGen->storelabel(startloop); .)
     Condition WEAK ")"         (. looplevel++; oldexit = loopexit;
                                   CGen->jumponfalse(testlabel, CGen->undefined);
                                   CGen->jump(dummylabel, CGen->undefined);
                                   CGen->jump(loopexit, CGen->undefined);
                                   CGen->backpatch(dummylabel); .)
     Statement<framesize>       (. CGen->jump(dummylabel, startloop);
                                   CGen->backpatch(testlabel);
                                   CGen->backpatch(loopexit);
                                   loopexit = oldexit; looplevel--; .) .

  BreakStatement
  =                             (. CGEN_labels dummylabel; .)
    "break"                     (. if (!looplevel) SemError(230);
                                   CGen->jump(dummylabel, loopexit); .)
    WEAK ";" .

Some might think that this is silly, for one can only execute one break statement in a loop. True - executing a break statement means you leave the loop; you cannot carry on within the loop to another break statement. But one has to be able to generate code for multiple break statements - consider, for example (another silly one)

    while (i < 10) {
      cin >> a; if (a == 0) break;
      cin >> b; if (b == 0) break;
      i++;
    }


Task 6(c) Inline assembler

As for Task 6(b), an elegant solution is actually much easier than an inelegant one. (There is general truth in that, actually. Elegance in programming is highly desirable; not many people achieve it, in spite of what they might think!)

Here is what I would suggest:

  InLineStatement
  =                             (. char mnemonic[10], S[200]; int adr; .)
    "inline" "{"
      { identifier              (. LexName(mnemonic, sizeof(mnemonic) - 1); .)
        (   Number<adr>         (. CGen->asmwithint(mnemonic, adr); .)
          | "-" Number<adr>     (. CGen->asmwithint(mnemonic, -adr) .)
          | String<S>           (. CGen->asmwithstr(mnemonic, S); .)
          |                     (. CGen->asmsimple(mnemonic); .)
        )
      }
    "}" .

The work is then delegated to the code generator; this makes sense, since all these mnemonics match machine- level concepts, rather than source-level ones. Since the code generator has access to the error reporter, it is easy enough to do some semantic error checking, and of course this is a good idea.

  void CGEN::asmwithint(char *mnemonic, int adr)
  { STKMC_opcodes opcode = Machine->opcode(mnemonic); // look it up
    if (opcode <= STKMC_prs) { emit(int(opcode)); emit(adr); }
    else Report->error(233);
  }

  void CGEN::asmwithstr(char *mnemonic, char *s)
  { CGEN_labels location;
    STKMC_opcodes opcode = Machine->opcode(mnemonic); // look it up
    if (opcode == STKMC_prs)
      { stackstring(s, location); emit(int(STKMC_prs)); emit(location); }
    else Report->error(232);
  }

  void CGEN::asmsimple(char *mnemonic)
  { STKMC_opcodes opcode = Machine->opcode(mnemonic); // look it up
    if (opcode > STKMC_prs && opcode != STKMC_nul) emit(int(opcode));
    else Report->error(231);
  }