RHODES UNIVERSITY

November Examinations - 1998


Section B [ 70 marks ]

Please note that there is no obligation to produce a machine readable solution for this section. Coco/R and other files are provided so that you can enhance, refine, or test your solution if you desire. If you choose to produce a machine readable solution, you should create a working directory, unpack EXAMP.ZIP (Pascal) or EXAMC.ZIP (C++), modify any files that you like, and then copy all the files back to the blank diskette that will be provided.

The manufacturers of a microprocessor that is currently being developed have been provided with an assembler for it, but are keen to work at a higher level, and have called upon various software houses to bid for a contract to provide a simple compiler that will generate assembler code from a simple source language, which is itself in the process of being developed.

Potential suppliers have been asked to meet an initial specification that calls for a compiler to support a language that can declare simple variables, and have simple assignment statements, if-then-else statements and compound statements. Because things are not yet stable, they would like to be able to mix inline assembler code with higher level source code, so as to compensate for the absence of some high level features.

The microprocessor design is also rather unstable. At this stage all that is known of the instruction set is that it has five opcodes MOV, CMP, BRN, BZE and BNZ. These can be described as follows:

The load/store instructions are of the form

             MOV  Rx, Ry     ;    Rx := Ry
             MOV  Rx, a      ;    Rx := a
             MOV  Rx, [a]    ;    Rx := Mem[a]
             MOV  [a], Rx    ;    Mem[a] := Rx
             MOV  [a], b     ;    Mem[a] := b
             MOV  [a], [b]   ;    Mem[a] := Mem[b]

where Rx and Ry are chosen from R1, R2, R3, R4, R5, the machine's five registers, and a and b are positive integers.

The comparison instructions are of the form

             CMP  x, y       ;    CPU.Z := (x = y)
             CMP  x, [b]     ;    CPU.Z := (x = Mem[b])
             CMP  [b], y     ;    CPU.Z := (Mem[b] = y)

where x and y can be any of R1, R2, R3, R4, R5 or a simple positive integer, and b is a positive integer.

The branch instructions are of the form

             BRN  x          ;    CPU.PC := x
             BZE  x          ;    if CPU.Z then CPU.PC := x
             BNZ  x          ;    if not CPU.Z then CPU.PC := x

where x is a positive integer.

In the high-level language, variable names can be freely chosen, and no restriction is imposed against accidentally using a register name or opcode mnemonic as a variable name.

Within the inline assembler code, however, while the opcode names are "reserved", other variable names may be used as operands. Within inline assembler code, branch instructions may target an alphanumeric label, and such labels may be attached to instructions.

Just to make things interesting, the company has requested that the compiler not be case sensitive, and that variables will be tied at the point of declaration to sequentially assigned addresses in memory, decreasing from location 255.

All this has rather perplexed the host of teams who are competing for the lucrative contract, so the company has released a few sample programs, showing all these features in programs otherwise devoid of any real semantic interest! One of these is as follows (others will be found in the exam kit):

  program
  (* simple example of mixed language program *)
    int
      x, r4, r5;        (* some with register names *)
    bool
      CMP, MOV, c, d;   (* some with opcode names *)
    int
      a, z;   (* and a few more *)
    begin
      x := r4;
      if x = r4
        then
          begin CMP := MOV; x := c; c := 203 end
        else
          if x = r5
            then MOV := c
            else MOV := a;
      a := z;
      if c = d
        then begin
          d := CMP;
          asm   (* each inline statement terminates with a semicolon *)
            mov r1, r2;  (* and comments may appear as usual *)
            mov r4, 34;
          Lab: mov r5, cmp;  (* labels must be followed immediately by a colon *)
            mov [34], 34;
            mov x, a;
            mov c, z;
            cmp r5, CMP;
            bnz Lab;         (* the target label does not include the colon *)
          end
        end
        else (* nothing *)
    end.

And here is the sort of output its compiler must produce (there is no need for the compiler to provide the comments; these are simply to help the reader. Labels can be used as indicated.

      MOV   [255], [254]    ;  x := r4
      CMP   [255], [254]    ;  if x = r4
      BNZ   L0              ;    then begin
      MOV   [252], [251]    ;      CMP := MOV
      MOV   [255], [250]    ;      x := c
      MOV   [250], 203      ;      c := 203
      BRN   L1              ;    end
   L0:                      ;    else
      CMP   [255], [253]    ;      if x = r5
      BNZ   L2              ;        then
      MOV   [251], [250]    ;          MOV := c
      BRN   L3              ;
   L2:                      ;        else
      MOV   [251], [248]    ;          MOV := a
   L3:                      ;
   L1:                      ;
      MOV   [248], [247]    ;  a := z
      CMP   [250], [249]    ;  if c = d
      BNZ   L4              ;    then begin
      MOV   [249], [252]    ;      d := CMP;
                            ;       asm
      MOV   R1, R2          ;         mov r1, r2;
      MOV   R4, 34          ;         mov r4, 34;
   LAB:
      MOV   R5, [252]       ;         mov r5, cmp;
      MOV   [34], 34        ;         mov [34], 34;
      MOV   [255], [248]    ;         mov x, a;
      MOV   [250], [247]    ;         mov c, z;
      CMP   R5, [252]       ;         cmp r5, cmp;
      BNZ   LAB             ;         bnz lab;
      BRN   L5              ;       end
   L4:                      ;     else (* nothing *)
   L5:                      ;

Naturally, the compiler must be able to recognise semantic errors; here is the sort of listing that should accompany a (deliberately) wrong submission:

    1  program
    2  (* simple example of wrong mixed language program *)
    3    int
    4      x, r4, r5;        (* some with register names etc *)
    5    bool
    6      CMP, MOV, c2, d;  (* some with opcode names *)
    7    int
    8      a, z;             (* and a few more *)
    9    begin
   10      x := r4;
   11      if x = r4
   12        then
   13          begin CMP := MOV; x := c; c := 203 end
*****                                 ^ Error: Undeclared variable
   14        else
   15          if x = r5
   16            then MOV := c
*****                        ^ Error: Undeclared variable
   17            else MOV := a;
   18      a := z;
   19      if c = d
*****         ^ Error: Undeclared variable
   20        then begin
   21          d := CMP;
   22          asm
   23            mov r1, r2;
   24            mov r4, 34;
   25            mov 34, 34;
*****                ^ Error: Invalid destination
   26            mov x;
*****                ^ Error: Too few operands
   27            move x, a;
*****            ^ Error: Invalid opcode
   28            mov r3a, z;
*****                ^ Error: Undeclared variable
   29            cmp r5, CMP;
   30            bnz 34, xx;
*****                ^ Error: Too many operands
*****                    ^ Error: Undeclared variable
   31          end
   32        end
   33        else (* nothing *)
   34    end.

Of course, as a programmer trained at the top educational institution in the land you should have learned to use compiler generating tools such as Coco/R, and in spite of incredible time constraints (the deadline for submissions of a prototype compiler is only 24 hours away) you should have little trouble in submitting a working system. So take up the challenge. Being open minded, the firm is prepared to accept implementations written in either Pascal or C++. They are also prepared to accept a fairly simple-minded approach to generating the assembler code - it will suffice to have a system where this is simply directed to the standard output file. And, since they have a licensed copy of Coco/R, they are prepared to accept submissions in the form of a Cocol attribute grammar plus any support modules needed, either on diskette or simply written on paper.

Hint: a complete attribute grammar may turn out to be rather long. For examination purposes it is recommended that you (a) develop a Cocol grammar describing the source language (b) add sufficient of the attributes for an intelligent examiner to be convinced that you understand the form that these would take, where semantics would be checked, and how code generation would take place (c) describe at least the interface to support routines that deal with symbol tables (function headings with adequate commentary will suffice if you do not have time to give all the details) (d) do not bother with checking that labels in the inline section are semantically meaningful, or correctly declared/defined. The examiners will be looking for evidence of quality and semantic correctness in your solutions, rather than 100% syntactically correct Cocol.

Confine yourself to simple assignment statements and comparisons like those in the example program - that is, where the right hand side of an assignment consists of a single constant or variable, and where comparisons are made for equality only, between operands that are single constants or variables - and consider only if-then- else statements where both the then and else clauses are present.

There is also no need to describe the support routines as full classes or units - simply assume that the routines can be coded up within a single #include file in the manner familiar from practicals done during the course, or included directly at the head of the grammar itself.