Computer Science 3 - 2001

Programming Language Translation


Practical for week 21 beginning 17 September 2001

Hand in this prac before lunch time on your next practical day, correctly packaged in a transparent folder with your solutions and the "cover sheet". Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.


Objectives

The objectives of this extended practical are to allow you to implement a compiler for a slightly extended version of CS301, by bootstrapping from the existing compiler for CS301, level 1.5. This compiler aims to compile CS301 source programs into code for a hypothetical stack machine like that described in chapter 4.4 of the text.

Hopefully after doing these exercises (and studying the attributed grammar and the various other support modules carefully) you will find you have learned a lot more about compilers and programming languages than you ever did before (and, I suspect, a lot more than undergraduates at any other university in this country!) I also hope that you will have begun to appreciate how useful it is to be able to base a really large and successful project on a clear formalism - namely the use of attributed context-free grammars - and will have learned to appreciate the use of sophisticated tools like Coco/R.


To hand in:

I do NOT require listings of any C++ code produced by Coco/R.


Before you begin

This version of CS301 and its compiler does not include support for functions and procedures. This means that there is effectively no practical work set on chapters 16 and 17 of the text. Because of the timing of our courses this is unavoidable, if highly regrettable. You should be warned that some of the material of these two chapters is very much "examinable".

In Practical 20 you were invited to make a whole bunch of syntactic alterations to CS301-1 to derive CS301-2. Ultimately all of these are to be incorporated into the extended compiler as well. Mindful of the fact that several members of this class are heavily involved in IS projects at the same time, I have cut the prac down to only a few non-trivial tasks, and have already implemented some of the extensions for you.

Please do not try to leave everything to the last 24 hours, or you will come horribly short. Please take the opportunity of coming to consult with me at any stage if you are in doubt as how best to continue. By all means experiment in other ways and with other extensions if you feel so inclined.


A note on test programs

Throughout this project you will need to ensure that the features you explore are correctly implemented. This means that you will have to get a feel for understanding code sequences produced by the compiler. The best way to do this is usually to write some very minimal programs, that do not necessarily do anything useful, but simply have one, or maybe two or three statements (the object code for which is easily predicted and understood).

When the CS301 compiler has finished compiling a program, say SILLY.CS3, you will find that it creates a file SILLY.COD in which the stack machine assembler code appears. Studying this is often very enlightening.

Useful test programs are ones like the following:

     PROGRAM debug
       VAR i, List[10];
       BEGIN
         WHILE 1 = 1 DO (* infinite loop, can generate an index error *)
           Read(i);
           List[i] := 100;
         END
       END.


Task 1 - Buy the kit for the last time

In the kit PRAC21.ZIP you will find the files you need:

CS301 is similar to Clang, but the extensions already implemented mean that there are quite a number of differences in the code generator and attribute grammar from those given in the text-book. You should make a point of studying this code fairly carefully in the run up to the examinations. In particular the STKMC and TABLE modules in the prac kit are not identical with those in the book. STKMC has been extended to incorporate the extended opcode set used in Practical 16 (allowing for Boolean operations and alternative simplified array indexing; it also has some better overflow checking and a mod opcode). TABLE has been extended to provide an explicit operation to clear the number of entries to zero (not that this is necessary in most applications, including this one), and to check that identifiers are uniquely declared. The grammar also incorporates a neat trick for "auto declaring" identifiers that have not been properly declared, which you might like to peruse.


Task 2 - I dare you - try it out!

Use GENMAKE to make yourself an appropriate MAKE file, as usual, and then go ahead to make the CS301 compiler from CS301.ATG as supplied, and compile it. Apply the compiler to the specimen programs DEMO.CS3, SIEVE.CS3 and BAD.CS3. Are the results as you expect?


Task 3 - A debugging pragma

As supplied, the CS301 compiler recognizes a program identifier of Debug as a special case. When a program of this name is compiled, the object code generated is listed in a file with a .COD extension, and the symbol table is directed to stdout. These features are very useful for debugging the compiler itself.

It is sometimes a nuisance always to produce debugging output - certainly this would be true of a production quality compiler. As a better alternative, make use of the PRAGMAS option of Coco/R (see text, section 12.3.4) to allow pragmas like the following to have the desired effect.

     $D+ /* Turn debugging mode on for testing the compiler */
     $D+ /* Turn debugging mode off */

Modify the CS301.ATG grammar to incorporate these pragmas, that is, to allow the user to choose whether the symbol table and the object code be listed or not. Another useful (run-time) debugging aid is the STACKDUMP statement (see text, end of section 15.1). Compilation of this could be controlled by the same pragma (in other words, the stack dumping code would only be generated in debug mode).

Hint. This is trivially easy to do.


Task 4 - Run-time array bounds checking

You will recall from Practical 16 that the machine could easily be extended to provide a variation on the IND operation that would not perform index bounds checking. This has been incorporated into the STKMC module that you have. Add a pair of $R-/$R+ pragmas to control the kind of indexing operations you would like generated.

Hint. This is also trivially easy to do.


Task 5 - You had better do this one or else....

Add the ELSE option to the IF statement. Oh, yes, it is trivial to add it to the grammar. But be careful. Some IF statements will have ELSE parts, others may not, and the code generator has to be able to produce the correct code for whatever form is actually to be compiled. The following silly examples are all valid.

          IF A = 1 THEN C := D END;
          IF A = 1 THEN END;
          IF A = 1 THEN ELSE END;
          IF A = 1 THEN ELSE B := 1 END;


Task 6 - Constant Expressions

Last week you extended CS301 to allow array declarations to be made in terms of predeclared constants, for example

         CONST
           Max = 100;
         INT
           List [Max];
           BigList [2 * Max];

Yes, as you saw last week, syntactically it's trivial. But how do you implement it?


Task 7 - Better control over output

Last week we suggested extending the language to adopt an idea used in Pascal, where a statement like

          Write (X : 5, X + A : 12, X - Y : 2 * N)

will write the values of X, X+A and X-Y in fields of widths 5, 12 and 2*N respectively.

Ho Hum. Things are hotting up, and there is a sting in the tail. There are really two kinds of expression, aren't there - integer ones, and Boolean ones, and the distinction will have to be drawn between them. There are two kinds of output operations too - PRN and PRB (print integer and print Boolean). Statements like this

          Write (3 + 4, A < B)

should produce numeric output for the first element, and Boolean output for the second one.

But statements like this

          Write (3 + 4 : A < B)

should not be allowed, because the optional formatting expression is meaningless unless it is numeric.


Task 8

In the last few weeks we have seen how to extend the syntax and static semantics of the language to allow for CASE or SWITCH statements. The CS301.ATG specification already incorporates these additions, with the further restriction that the DEFAULT arm of the statement has been made mandatory (one can always provide an empty statement sequence if this default has to do nothing!

Code generation for the CASE statement is decidedly non-trivial, and should keep you happily occupied for a while.


Bonus task 9 - Short circuit Boolean semantics

We have so far implied that the AND and OR operators should simply work in much the same way that arithmetic operators do, and that is what the supplied compiler implements. However, in most languages they are supposed to have "short circuit" semantics. As an interesting and challenging exercise, incorporate these semantics (perhaps allow a further pragma to allow a user to choose). There are probably many ways to do this. One of the easiest, given that we are generating code for an ideal and extensible stack machine, is to introduce special purpose machine operations. (Essentially one has to be able to incorporate the equivalent of conditional branch instructions into the evaluation of expressions).


Appendix - Semantic error messages in CS301

The following is an extract from the compiler frame file as supplied to you. Note that semantic error numbers start at 200. In fact they could start anywhere, save that Coco/R generates a whole lot of low-numbered messages for itself, from the grammar, and you need to make sure that your error numbers don't clash with these.

    case 200: return "Constant out of range";
    case 201: return "Identifier redeclared";
    case 202: return "Undeclared identifier";
    case 203: return "Unexpected parameters";
    case 204: return "Unexpected subscript";
    case 205: return "Subscript required";
    case 206: return "Invalid class of identifier";
    case 207: return "Variable expected";
    case 208: return "Too many formal parameters";
    case 209: return "Wrong number of parameters";
    case 210: return "Invalid assignment";
    case 211: return "Cannot read this type of variable";
    case 212: return "Program too long";
    case 213: return "Too deeply nested";
    case 214: return "Invalid parameter";
    case 215: return "COBEGIN only allowed in main program";
    case 216: return "Too many concurrent processes";
    case 217: return "Only global procedure calls allowed here";
    case 218: return "Type mismatch";
    case 219: return "Unexpected expression";
    case 220: return "Missing expression";
    case 221: return "Boolean expression required";
    case 222: return "Invalid expression";
    case 223: return "Index out of range";
    case 224: return "Division by zero";
    case 225: return "Invalid subscript type";
    case 226: return "Invalid array declaration";
    case 227: return "Invalid case label range";
    case 228: return "Case label duplicated";
    default:  return "Compiler error";


1 COMPILER CS301 $CNX 2 /* CS301-2 level 1 grammar - this version for Prac 21 development 3 P.D. Terry, Rhodes University, 2001 */ 4 5 #include "misc.h" 6 #include "set.h" 7 #include "cgen.h" 8 #include "table.h" 9 #include "report.h" 10 11 typedef Set<7> classset; 12 typedef Set<4> typeset; 13 14 bool debug; 15 extern TABLE *Table; 16 extern CGEN *CGen; 17 extern REPORT *Report; // needed for debugging pragma 18 19 typeset arithtypes, booltypes; 20 21 /*--------------------------------------------------------------------------*/ 22 23 IGNORE CASE 24 IGNORE CHR(9) .. CHR(13) 25 COMMENTS FROM "{" TO "}" 26 27 CHARACTERS 28 cr = CHR(13) . 29 lf = CHR(10) . 30 letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . 31 digit = "0123456789" . 32 instring = ANY - "'" - cr - lf - CHR(0) . 33 34 TOKENS 35 identifier = letter { letter | digit } . 36 number = digit { digit } . 37 string = "'" (instring | "''") { instring | "''" } "'" . 38 39 PRODUCTIONS 40 CS301 41 = (. TABLE_entries entry; 42 arithtypes = typeset(TABLE_none, TABLE_ints); 43 booltypes = typeset(TABLE_none, TABLE_bools); .) 44 "PROGRAM" 45 Ident<entry.name> (. debug = (strcmp(entry.name, "DEBUG") == 0); 46 entry.idclass = TABLE_progs; 47 entry.type = TABLE_none; 48 Table->enter(entry); .) 49 WEAK ";" Block "." . 50 51 Block 52 = (. int framesize = 0; .) 53 SYNC { ( ConstDeclarations | VarDeclarations<framesize> ) 54 SYNC } (. /* reserve space for variables */ 55 CGen->openstackframe(framesize); .) 56 "BEGIN" StatementSequence (. CGen->leaveprogram(); 57 if (debug) /* demonstration purposes */ 58 Table->printtable(stdout); .) 59 "END" . 60 61 ConstDeclarations 62 = "CONST" OneConst { OneConst } . 63 64 OneConst 65 = (. TABLE_entries entry; .) 66 Ident<entry.name> (. entry.idclass = TABLE_consts; .) 67 WEAK "=" 68 ( Number<entry.c.value> (. entry.type = TABLE_ints; .) 69 | "TRUE" (. entry.type = TABLE_bools; entry.c.value = 1; .) 70 | "FALSE" (. entry.type = TABLE_bools; entry.c.value = 0; .) 71 ) ";" (. Table->enter(entry); .) . 72 73 74 VarDeclarations<int &framesize> 75 = (. TABLE_types t; .) 76 ( "INT" (. t = TABLE_ints; .) 77 | "BOOL" (. t = TABLE_bools; .) 78 ) OneVar<framesize, t> { WEAK "," OneVar<framesize, t> } ";" . 79 80 OneVar<int &framesize, TABLE_types t> 81 = (. TABLE_entries entry; .) 82 (. entry.idclass = TABLE_vars; 83 entry.v.size = 1; entry.v.scalar = true; 84 entry.type = t; 85 entry.v.offset = framesize + 1; .) 86 Ident<entry.name> 87 [ UpperBound<entry.v.size> (. entry.v.scalar = false; .) 88 ] (. Table->enter(entry); 89 framesize += entry.v.size; .) . 90 91 UpperBound<int &size> 92 = "[" Number<size> "]" (. size++; .) . 93 94 StatementSequence 95 = Statement { WEAK ";" Statement } . 96 97 Statement 98 = SYNC [ Assignment 99 | IfStatement | WhileStatement 100 | ReadStatement | WriteStatement 101 | ReturnStatement | CaseStatement 102 | "STACKDUMP" (. CGen->dump(); .) 103 104 ] . 105 106 Assignment 107 = (. TABLE_types exptype, vartype; 108 CGEN_operators op; .) 109 Variable<vartype> 110 ( AssignOp<op> 111 Expression<exptype> (. if (exptype != vartype) SemError(218); 112 CGen->binaryintegerop(op); 113 CGen->assign(); .) 114 | "++" (. if (!arithtypes.memb(vartype)) SemError(218); 115 CGen->increment(); .) 116 | "--" (. if (!arithtypes.memb(vartype)) SemError(218); 117 CGen->decrement(); .) 118 ) SYNC . 119 120 121 Variable<TABLE_types &vartype> 122 = (. TABLE_entries entry; .) 123 Designator<classset(TABLE_vars), entry> 124 (. vartype = entry.type; .) . 125 126 Designator<classset allowed, TABLE_entries &entry> 127 = (. TABLE_alfa name; 128 TABLE_types exptype; 129 bool found; .) 130 Ident<name> (. Table->search(name, entry, found); 131 if (!found) 132 { SemError(202); /* force declaration */ 133 strcpy(entry.name, name); 134 entry.type = TABLE_none; 135 entry.idclass = TABLE_vars; 136 entry.v.scalar = true; 137 entry.v.size = 1; 138 entry.v.offset = 0; 139 } 140 if (!allowed.memb(entry.idclass)) SemError(206); 141 if (entry.idclass != TABLE_vars) return; 142 CGen->stackaddress(entry.v.offset); .) 143 ( "[" (. if (!found) { entry.v.scalar = false; Table->enter(entry); } 144 if (entry.v.scalar) SemError(204); .) 145 Expression<exptype> (. if (!arithtypes.memb(exptype)) SemError(227); 146 /* determine size for bounds check */ 147 CGen->stackconstant(entry.v.size); 148 CGen->subscript(); .) 149 "]" 150 | (. if (!found) Table->enter(entry); 151 if (!entry.v.scalar) SemError(205); .) 152 ) . 153 154 ReturnStatement 155 = "RETURN" (. CGen->leaveprogram(); .) . 156 157 IfStatement 158 = (. CGEN_labels testlabel; .) 159 "IF" Condition "THEN" (. CGen->jumponfalse(testlabel, CGen->undefined); .) 160 StatementSequence (. CGen->backpatch(testlabel); .) 161 "END" . 162 163 WhileStatement 164 = (. CGEN_labels startloop, testlabel, dummylabel; .) 165 "WHILE" (. CGen->storelabel(startloop); .) 166 Condition "DO" (. CGen->jumponfalse(testlabel, CGen->undefined); .) 167 StatementSequence (. CGen->jump(dummylabel, startloop); 168 CGen->backpatch(testlabel); .) 169 "END" . 170 171 CaseStatement 172 = (. TABLE_types exptype; 173 LabelList L; .) 174 "CASE" 175 Expression<exptype> 176 "OF" 177 OneCase<L> 178 { "|" OneCase<L> 179 } 180 "DEFAULT" ":" 181 StatementSequence 182 "END" . 183 184 OneCase<LabelList &L> 185 = [ Range<L> 186 { WEAK "," Range<L> 187 } 188 ":" 189 StatementSequence 190 ] 191 . 192 193 Range<LabelList &L> 194 = (. int low; .) 195 IntConst<low> (. int high = low; .) 196 [ ".." IntConst<high> ] (. if (high < low) SemError(227); 197 else for (int x = low; x <= high; x++) { 198 if (L.isMember(x)) SemError(228); 199 else L.add(x); 200 } 201 .) . 202 203 IntConst<int &i> 204 = (. int sign = 1; 205 char str[100]; .) 206 [ "+" | "-" (. sign = -1 .) 207 ] number (. LexString(str, 100); 208 i = sign * atoi(str); 209 .) . 210 211 Condition 212 = (. TABLE_types exptype; .) 213 Expression<exptype> (. if (!booltypes.memb(exptype)) SemError(221) .) . 214 215 ReadStatement 216 = "READ" "(" ReadElement { WEAK "," ReadElement } ")" . 217 218 ReadElement 219 = (. TABLE_types vartype; .) 220 Variable<vartype> (. switch (vartype) 221 { case TABLE_ints : CGen->readvalue(); break; 222 case TABLE_bools : CGen->readboolean(); break; 223 default : SemError(211); break; 224 } .) . 225 226 WriteStatement 227 = (. bool newline = false .) 228 ( 229 ( "WRITE" | "WRITELN" (. newline = true; .) 230 ) [ "(" WriteElement { WEAK "," WriteElement } ")" ] 231 | "NEWLINE" (. newline = true; .) 232 ) (. if (newline) CGen->newline(); .) . 233 234 WriteElement 235 = (. TABLE_types exptype; 236 char str[600]; 237 CGEN_labels startstring; .) 238 String<str> (. CGen->stackstring(str, startstring); 239 CGen->writestring(startstring); .) 240 | Expression<exptype> 241 (. switch (exptype) 242 { case TABLE_ints : CGen->writevalue(); break; 243 case TABLE_bools : CGen->writeboolean(); break; 244 case TABLE_none : /* error already */ break; 245 } .) . 246 247 Expression<TABLE_types &e> 248 = (. TABLE_types a; .) 249 AndExp<e> 250 { "OR" AndExp<a> (. if (!(booltypes.memb(e) && booltypes.memb(a))) 251 { SemError(218); e = TABLE_none; } 252 else e = TABLE_bools; 253 CGen->binarybooleanop(CGEN_orrop); .) 254 } . 255 256 AndExp<TABLE_types &a> 257 = (. TABLE_types e; .) 258 RelExp<a> 259 { "AND" RelExp<e> (. if (!(booltypes.memb(a) && booltypes.memb(e))) 260 { SemError(218); a = TABLE_none; } 261 else a = TABLE_bools; 262 CGen->binarybooleanop(CGEN_andop); .) 263 } . 264 265 266 RelExp<TABLE_types &r> 267 = (. TABLE_types a; 268 CGEN_operators op; .) 269 AddExp<r> 270 [ RelOp<op> AddExp<a> (. if (r == TABLE_bools || a == TABLE_bools) SemError(218); 271 r = TABLE_bools; CGen->comparison(op) .) 272 ] . 273 274 AddExp<TABLE_types &a> 275 = (. TABLE_types m; 276 CGEN_operators op; .) 277 MultExp<a> 278 { AddOp<op> MultExp<m> (. if (!(arithtypes.memb(a) && arithtypes.memb(m))) 279 { SemError(218); a = TABLE_none; } 280 else CGen ->binaryintegerop(op); .) 281 } . 282 283 MultExp<TABLE_types &m> 284 = (. TABLE_types u; 285 CGEN_operators op; .) 286 UnaryExp<m> 287 { MulOp<op> UnaryExp<u> (. if (!(arithtypes.memb(m) && arithtypes.memb(u))) 288 { SemError(218); m = TABLE_none; } 289 else CGen ->binaryintegerop(op); .) 290 } . 291 292 UnaryExp<TABLE_types &u> 293 = Factor<u> 294 | "+" UnaryExp<u> (. if (!arithtypes.memb(u)) { 295 SemError(218); u = TABLE_none; } .) 296 | "-" UnaryExp<u> (. if (!arithtypes.memb(u)) { 297 SemError(218); u = TABLE_none; } 298 else CGen->negateinteger(); .) 299 | "NOT" UnaryExp<u> (. if (!booltypes.memb(u)) SemError(218); 300 else CGen->negateboolean(); 301 u = TABLE_bools; .) . 302 303 Factor<TABLE_types &f> 304 = (. int value; 305 TABLE_entries entry; .) 306 Designator<classset(TABLE_consts, TABLE_vars), entry> 307 (. f = entry.type; 308 switch (entry.idclass) 309 { case TABLE_vars : 310 CGen->dereference(); break; 311 case TABLE_consts : 312 CGen->stackconstant(entry.c.value); break; 313 } .) 314 | Number<value> (. CGen->stackconstant(value); f = TABLE_ints; .) 315 | "TRUE" (. CGen->stackconstant(1); f = TABLE_bools .) 316 | "FALSE" (. CGen->stackconstant(0); f = TABLE_bools .) 317 | "(" Expression<f> ")" . 318 319 AddOp<CGEN_operators &op> 320 = (. op = CGEN_opnop; .) 321 ( "+" (. op = CGEN_opadd; .) 322 | "-" (. op = CGEN_opsub; .) 323 ) . 324 325 MulOp<CGEN_operators &op> 326 = (. op = CGEN_opnop; .) 327 ( "*" (. op = CGEN_opmul; .) 328 | "%" (. op = CGEN_opmod; .) 329 | "/" (. op = CGEN_opdvd; .) 330 ) . 331 332 RelOp<CGEN_operators &op> 333 = (. op = CGEN_opnop; .) 334 ( "=" (. op = CGEN_opeql; .) 335 | "<>" (. op = CGEN_opneq; .) 336 | "<" (. op = CGEN_oplss; .) 337 | "<=" (. op = CGEN_opleq; .) 338 | ">" (. op = CGEN_opgtr; .) 339 | ">=" (. op = CGEN_opgeq; .) 340 ) . 341 342 AssignOp<CGEN_operators &op> 343 = (. op = CGEN_opnop; .) 344 ( ":=" 345 | "=" (. SemError(210); .) 346 | (. CGen->duplicate(); CGen->dereference() .) 347 ( "+=" (. op = CGEN_opadd; .) 348 | "-=" (. op = CGEN_opsub; .) 349 | "*=" (. op = CGEN_opmul; .) 350 | "/=" (. op = CGEN_opdvd; .) 351 | "%=" (. op = CGEN_opmod; .) 352 ) 353 ) . 354 355 Ident<char *name> 356 = identifier (. LexName(name, TABLE_alfalength); .) . 357 358 String<char *str> 359 = string (. char local[100]; 360 LexString(local, sizeof(local) - 1); 361 int i = 0; 362 while (local[i]) /* strip quotes */ 363 { local[i] = local[i+1]; i++; } 364 local[i-2] = '\0'; 365 i = 0; 366 while (local[i]) /* find internal quotes */ 367 { if (local[i] == '\'') /* single quote */ 368 { int j = i; 369 while (local[j]) 370 { local[j] = local[j+1]; j++; } 371 } 372 i++; 373 } 374 strcpy(str, local); .) . 375 376 Number <int &num> 377 = number (. char str[100]; 378 int i = 0, l, digit, overflow = 0; 379 num = 0; 380 LexString(str, sizeof(str) - 1); 381 l = strlen(str); 382 while (i <= l && isdigit(str[i])) 383 { digit = str[i] - '0'; i++; 384 if (num <= (maxint - digit) / 10) 385 num = 10 * num + digit; 386 else overflow = 1; 387 } 388 if (overflow) SemError(200); .) . 389 390 END CS301.


Home  © P.D. Terry