Please note that there is no obligation to produce a machine readable solution for this section. Coco/R and other files are provided so that you can enhance, refine, or test your solution if you desire. If you choose to produce a machine readable solution, you should create a working directory, unpack EXAMP.ZIP (Pascal) or EXAMC.ZIP (C++), modify any files that you like, and then copy all the files back to the blank diskette that will be provided.
The manufacturers of a microprocessor that is currently being developed have been provided with an assembler for it, but are keen to work at a higher level, and have called upon various software houses to bid for a contract to provide a simple compiler that will generate assembler code from a simple source language, which is itself in the process of being developed.
Potential suppliers have been asked to meet an initial specification that calls for a compiler to support a language that can declare simple variables, and have simple assignment statements, if-then-else statements and compound statements. Because things are not yet stable, they would like to be able to mix inline assembler code with higher level source code, so as to compensate for the absence of some high level features.
The microprocessor design is also rather unstable. At this stage all that is known of the instruction set is that it has five opcodes MOV, CMP, BRN, BZE and BNZ. These can be described as follows:
The load/store instructions are of the form
MOV Rx, Ry ; Rx := Ry MOV Rx, a ; Rx := a MOV Rx, [a] ; Rx := Mem[a] MOV [a], Rx ; Mem[a] := Rx MOV [a], b ; Mem[a] := b MOV [a], [b] ; Mem[a] := Mem[b]
where Rx and Ry are chosen from R1, R2, R3, R4, R5, the machine's five registers, and a and b are positive integers.
The comparison instructions are of the form
CMP x, y ; CPU.Z := (x = y) CMP x, [b] ; CPU.Z := (x = Mem[b]) CMP [b], y ; CPU.Z := (Mem[b] = y)
where x and y can be any of R1, R2, R3, R4, R5 or a simple positive integer, and b is a positive integer.
The branch instructions are of the form
BRN x ; CPU.PC := x BZE x ; if CPU.Z then CPU.PC := x BNZ x ; if not CPU.Z then CPU.PC := x
where x is a positive integer.
In the high-level language, variable names can be freely chosen, and no restriction is imposed against accidentally using a register name or opcode mnemonic as a variable name.
Within the inline assembler code, however, while the opcode names are "reserved", other variable names may be used as operands. Within inline assembler code, branch instructions may target an alphanumeric label, and such labels may be attached to instructions.
Just to make things interesting, the company has requested that the compiler not be case sensitive, and that variables will be tied at the point of declaration to sequentially assigned addresses in memory, decreasing from location 255.
All this has rather perplexed the host of teams who are competing for the lucrative contract, so the company has released a few sample programs, showing all these features in programs otherwise devoid of any real semantic interest! One of these is as follows (others will be found in the exam kit):
program (* simple example of mixed language program *) int x, r4, r5; (* some with register names *) bool CMP, MOV, c, d; (* some with opcode names *) int a, z; (* and a few more *) begin x := r4; if x = r4 then begin CMP := MOV; x := c; c := 203 end else if x = r5 then MOV := c else MOV := a; a := z; if c = d then begin d := CMP; asm (* each inline statement terminates with a semicolon *) mov r1, r2; (* and comments may appear as usual *) mov r4, 34; Lab: mov r5, cmp; (* labels must be followed immediately by a colon *) mov [34], 34; mov x, a; mov c, z; cmp r5, CMP; bnz Lab; (* the target label does not include the colon *) end end else (* nothing *) end.
And here is the sort of output its compiler must produce (there is no need for the compiler to provide the comments; these are simply to help the reader. Labels can be used as indicated.
MOV [255], [254] ; x := r4 CMP [255], [254] ; if x = r4 BNZ L0 ; then begin MOV [252], [251] ; CMP := MOV MOV [255], [250] ; x := c MOV [250], 203 ; c := 203 BRN L1 ; end L0: ; else CMP [255], [253] ; if x = r5 BNZ L2 ; then MOV [251], [250] ; MOV := c BRN L3 ; L2: ; else MOV [251], [248] ; MOV := a L3: ; L1: ; MOV [248], [247] ; a := z CMP [250], [249] ; if c = d BNZ L4 ; then begin MOV [249], [252] ; d := CMP; ; asm MOV R1, R2 ; mov r1, r2; MOV R4, 34 ; mov r4, 34; LAB: MOV R5, [252] ; mov r5, cmp; MOV [34], 34 ; mov [34], 34; MOV [255], [248] ; mov x, a; MOV [250], [247] ; mov c, z; CMP R5, [252] ; cmp r5, cmp; BNZ LAB ; bnz lab; BRN L5 ; end L4: ; else (* nothing *) L5: ;
Naturally, the compiler must be able to recognise semantic errors; here is the sort of listing that should accompany a (deliberately) wrong submission:
1 program 2 (* simple example of wrong mixed language program *) 3 int 4 x, r4, r5; (* some with register names etc *) 5 bool 6 CMP, MOV, c2, d; (* some with opcode names *) 7 int 8 a, z; (* and a few more *) 9 begin 10 x := r4; 11 if x = r4 12 then 13 begin CMP := MOV; x := c; c := 203 end ***** ^ Error: Undeclared variable 14 else 15 if x = r5 16 then MOV := c ***** ^ Error: Undeclared variable 17 else MOV := a; 18 a := z; 19 if c = d ***** ^ Error: Undeclared variable 20 then begin 21 d := CMP; 22 asm 23 mov r1, r2; 24 mov r4, 34; 25 mov 34, 34; ***** ^ Error: Invalid destination 26 mov x; ***** ^ Error: Too few operands 27 move x, a; ***** ^ Error: Invalid opcode 28 mov r3a, z; ***** ^ Error: Undeclared variable 29 cmp r5, CMP; 30 bnz 34, xx; ***** ^ Error: Too many operands ***** ^ Error: Undeclared variable 31 end 32 end 33 else (* nothing *) 34 end.
Of course, as a programmer trained at the top educational institution in the land you should have learned to use compiler generating tools such as Coco/R, and in spite of incredible time constraints (the deadline for submissions of a prototype compiler is only 24 hours away) you should have little trouble in submitting a working system. So take up the challenge. Being open minded, the firm is prepared to accept implementations written in either Pascal or C++. They are also prepared to accept a fairly simple-minded approach to generating the assembler code - it will suffice to have a system where this is simply directed to the standard output file. And, since they have a licensed copy of Coco/R, they are prepared to accept submissions in the form of a Cocol attribute grammar plus any support modules needed, either on diskette or simply written on paper.
Hint: a complete attribute grammar may turn out to be rather long. For examination purposes it is recommended that you (a) develop a Cocol grammar describing the source language (b) add sufficient of the attributes for an intelligent examiner to be convinced that you understand the form that these would take, where semantics would be checked, and how code generation would take place (c) describe at least the interface to support routines that deal with symbol tables (function headings with adequate commentary will suffice if you do not have time to give all the details) (d) do not bother with checking that labels in the inline section are semantically meaningful, or correctly declared/defined. The examiners will be looking for evidence of quality and semantic correctness in your solutions, rather than 100% syntactically correct Cocol.
Confine yourself to simple assignment statements and comparisons like those in the example program - that is, where the right hand side of an assignment consists of a single constant or variable, and where comparisons are made for equality only, between operands that are single constants or variables - and consider only if-then- else statements where both the then and else clauses are present.
There is also no need to describe the support routines as full classes or units - simply assume that the routines can be coded up within a single #include file in the manner familiar from practicals done during the course, or included directly at the head of the grammar itself.