Computer Science 3 - 2000

Programming Language Translation


Practical for Weeks 25 - 26, beginning 16 October 2000

There are two hand-in dates for this practical. Solutions to Task 2 must be submitted by Tuesday 24 October at 17h00. Late entries will not be accepted. This submission must consist of an edited version of the file QUIZ from the prac kit, with your answers to the questions added in the relevant places.

Solutions to the other tasks must be submitted by Tuesday 31 October at 17h00. Late entries will not be accepted.

This submission must consist of listings and disk copies of your attribute grammar and code generator, and also the table handler and stack machine emulator if you change the supplied ones. Please use a highlighter pen to mark these listings in the places where you have made changes to the original. You should also submit listings of a selection of test programs and the code they generate to show that you completed the tasks satisfactorily.

Hand in one submission per group in the usual way.

Your surname: (PRINT CLEARLY)          Your prac day:

Your student number: (PRINT CLEARLY)

Your signature:                        Your group partners:


Other people with whom you collaborated (you need not give tutors' names):



Mark awarded:                          Tutor's signature


Objectives

The objectives of this extended practical are to allow you to examine, understand, and then extend and modify a compiler for Topsy++, level 1++. This compiler aims to compile Topsy++ source programs into code for the hypothetical stack machine like that described in chapter 4.4 of the text. Hopefully after doing these exercises you will have learned a lot more about compilers and programming languages than you ever did before (and, I suspect, a lot more than undergraduates at any other university in the country!) I also hope that you will have begun to appreciate how useful it is to be able to base a really large and successful project on a clear formalism - namely the use of attributed context-free grammars - and will have learned to appreciate the use of sophisticated tools like Coco/R.


Before you begin

The tasks are presented below in an order which, if followed, should make the practical an enjoyable and enriching experience. Please do not try to leave everything to the last few hours, or you will come horribly short. You must work consistently, and with a view to getting an overview of the entire project, as the various components and tasks all interact in ways that will probably not at first be apparent. Please take the opportunity of coming to consult with me at any stage if you are in doubt as how best to continue. By all means experiment in other ways and with other extensions if you feel so inclined.

Topsy++ has been restricted so as not to include functions. This means that there is effectively no practical work set on chapters 16 and 17 of the text. Because of the timing of our courses this is unavoidable, if highly regrettable. You should be warned that some of the material of these two chapters is "examinable".


The Topsy++ language report

Along with this prac sheet and the listings of various support modules you will need the copy of the Topsy++ language report, which is intended to describe the syntax, static semantics, and the dynamic semantic features of Topsy++ that one needs to understand in order to use and/or implement the language.


Task 1 - Getting over the shock of such a large handout

Unpack the practical kit. In it you will find the following goodies:

Use GENMAKE to make yourself an appropriate MAKE file, as usual, and then go ahead to make the Topsy++ compiler from TOPSY.ATG, and compile it. Apply the compiler to the specimen programs. Are the results as you expect?

A note on test programs

Throughout this project you will need to ensure that the features you explore are correctly implemented. This means that you will have to get a feel for understanding code sequences produced by the compiler. The best way to do this is usually to write some very minimal programs, that do not necessarily do anything useful, but simply have one, or maybe two, statements (the object code for which is easily predicted and understood). VOTE.TOP and SORT.TOP are programs that actually use most of the features of this first stage. As such, they are really too large for test programs!

More useful test programs are ones like the following:

    void main (void) {
      int i, List[10];
      while (true) { // infinite loop, we should be able to generate an index error
        cin >> i;
        List[i] = 100;
      }
    }

When the Topsy++ compiler has finished compiling a program, say SILLY.TOP, you will find that it creates a file SILLY.COD in which the stack machine assembler code appears. Studying this is often very enlightening.

The debugging pragma

It is useful when writing a compiler to be able to produce debugging output - but sometimes this just clutters up a production quality compiler. The TOPSY.ATG grammar makes use of the PRAGMAS option of Coco/R (see text, page 166) to allow a line like that shown to have the desired effect.

$D+ // Turn diagnostic mode on for testing the compiler


Task 2 - Getting to understand Topsy's compiler

Here are some awkward questions. Answer them by editing the file QUIZ in the kit, and submit it as your solution by next Tuesday. Your solutions can refer to the line numbers on the listings that form part of this handout.

(a) In Clang a declaration like

VAR List[10];

specifies the upper range of an index, and is equivalent to Pascal's

VAR List : ARRAY [0 .. 10] OF INTEGER;

In Topsy++, however, a declaration like

int List[10];

specifies the size of the array, not the upper limit on an index, and would be closer to Pascal's

VAR List : ARRAY [0 .. 9] OF INTEGER;

So what on earth would happen if you submitted a declaration like

int x[0];

to the Topsy++ compiler? Does it mean anything to have an array of size 0? What? What does it mean in C++? Is it allowed in C++?

(b) Study how the compiler handles escape sequences like \t and \a and \n that appear in string literals. Does it handle them all correctly? If not, where or why not?

(c) The for loop is quite a complex construct. Why do you suppose you are not allowed to alter the control variable? (See section 10.6 of the Report.) Where is this checked by the compiler? Is the check robust, or could you write a Topsy++ program that defeated the check? If so, how?

(d) Study carefully how the compiler handles declarations of variable local to inner block statements - Consider, for example, programs like

  $D+ // Turn diagnostic mode on for testing the compiler
  // The dynamic semantics of this program are not really meaningful!

  void main (void) {
    int i;                 // i is global
    { int j, k = 10;       // j and k are local to this inner statement
      j = 2 * k;
      int l = j + k;       // l is also local to this statement
      cin >> i;            // i is still in scope
      if (i > 0) {
        l = k;             // l and k are still in scope
        int m = 90;        // m is local to the body of the if statement
      }
    }
    i = 2 * j + m;         // i is in scope, but j and m are not
  }

  $D+ // Turn diagnostic mode on for testing the compiler
  // The dynamic semantics of this program are not really meaningful!

  void main (void) {
    int i;                 // i is global
    { int i, j, k = 10;    // j and k, and a new i, are local to this statement
      j = 2 * k;
      int l = j + k;       // l is also local to this inner statement
      cin >> i;            // the second i is the one in scope
      if (i > 0) {         // the second i is the one in scope
        l = k;             // l and k are still in scope
        int m = 90;        // m is local to the body of the if statement
        bool i = true;     // yet another i is allowed, local to this body
      }
      if (i > 0) {
        bool i = true;     // yet another i
        int m = 90;        // yet another m
      }
    }
  }

What offsets (addresses) from the base pointer CPU.BP are associated with each variable?

(e) In the second example above it appears that we can have more than one variable named in existence at run time with the name i, and that these variables can even be of different types. Does this really make sense? If so, how do the compiler and interpreter work out which i is which?

(f) In the definition of the symbol table you will find an enumeration of the possible types that variables and expressions can have:

enum TABLE_types {TABLE_none, TABLE_ints, TABLE_chars, TABLE_bools };

in C++, or the equivalent in Pascal:

TYPES = (none, ints, chars, bools);

Surely this is ridiculous? We can only declare variables to be of int, bool or char types. So why is there a none type in the enumeration?

(g) In the Topsy++ manual you will find that a Block is defined as

Block = "{" { Declaration | Statement } "}" .

but in the TOPSY.ATG file you will find

Block = "{" { Statement } "}" .

with the production for Statement augmented to include the forms of Declaration. Is there something wrong with the ATG file, or with the manual, with both, or with neither?

(h) In the Topsy++ manual, section 9.2.1 you will find a description of semantics of the || and && operations. What sort of code is produced for statements like

           i = 5;
           j = 6;
           if (i > 3) || (j > 4) cout << "passed";

and how would this be interpreted at run time (study the operation of the stack machine - it is easy to generate the code using the compiler if you don't feel like doing it by hand).

(i) Suppose we write a loop like this

           char c;
           for (c = 'a'; true; c++);

which looks a bit odd, but is syntactically legal. What sort of code is produced, and how does it behave at run time?

(j) Consider this gem of a program. What is the problem, and how does Topsy handle it?

       void main (void) {
         char c = 'z';
         while (true) { // infinite loop, we should be able to generate an error
           c++;         // haven't I told you to avoid c++?
           cout << c;
         }
       }

(k) The version of Topsy that you have cunningly uses the PSH and POP opcodes where it can. Study the code generation process carefully and describe it briefly in your own words.

(l) Well, Topsy++ may be a nice little toy, but it has a number of annoying differences from C-plus-whatsit. For example:

(a) C++ has a do-while loop and Topsy++ currently has a do-until loop.

(b) C++ has a baroque precedence ordering for its operators (something like 19 levels, while Topsy++ currently has only about 4).

(c) C++ has that cute little break statement that allows you to leave a loop early.

As your final task to be completed by the Tuesday, work out how the basic unattributed grammar for Topsy++ (you can find it in TOPSY0.ATG) would have to be modified to allow the do-while loop and the break statement, and to replace the productions for Expressions with a set that would reflect the precedence of the operators in the way that C++ requires. (You don't have to add all 19 levels, of course. Topsy++ does not have all of the operators that C++ has, in any case.)


Task 3 - The day of reckoning draw nearer ...

Well, we want to implement some of the fun extensions in Task 2 eventually. But let's hack at the system in some simpler ways first.

We have already commented on the $D pragma. At present it is only used to request the printout of a symbol table. How would you change the system so that one would have to use the pragma if one wanted to obtain the assembler code file?

Another useful (run-time) debugging aid is the stackdump statement (see text, page 223). Compilation of this could be controlled by the same pragma (in other words, the stack dumping code would only be generated in debug mode).

Hint: These changes are almost trivially easy.


Task 4 - and still the apocalypse draws nearer ...

One of the nastier mistakes we all probably make in C++ is to write code like

if (a = b) Statement;

when of course we probably meant to write

if (a == b) Statement;

This "feature" of C++ is so often a "bug" that many C++ compilers issue a warning if they come across code like that in the first example (they can only warn, as the code is not incorrect, and, yes, folks, as someone once said "C++ is its own virus"). What does Topsy++ do at present? Can you find a way to be more user friendly?


Task 5 - it draws ever closer ...

Okay, time to turn to fun things at last. Replace the do-until loop with the do-while loop. Why do you suppose the inventor of the original Topsy++ did such a silly thing as to use the do-until formulation anyway?

Hint: You should find this very easy indeed.


Task 6 - It just arrived!

There are three tasks left. Please attempt at least two of them. Full credit can be obtained on two; extra credit if you do all three:

(a) Well, if you understand the way the system works, things should be starting to fall into place. Scrap all the current Expression, Term and Factor stuff, and replace it with something that will reflect the new operator precedences that you (should have) designed in Task 2.

(Hint: You should be able to do this quite easily. After all, the stack machine operations don't have to change. So the attributes and actions probably look very like the ones you have already. But this is, admittedly, a bit more work than the other tasks, just because there is more detail to attend to.)

(b) Come, come. One of the simplest statements we could introduce into the language - break - still has to be implemented. I dare you!

(Hint: Read the discussion in the textbook in Exercise 15.8.)

(c) In an earlier prac we pointed out that some compilers allow user to intermingle inline assembler code with high level code. Implement this. Here's an example:

     void main (void) { $D+
       cout << 4 << "followed by a piece of string";
       inline {
         LIT 4
         PRN
         PRS "followed by a piece of string"
         HLT
       }
       return;
     }

(Hint: In spite of the discussion in Prac 22 you will actually find this much easier to do if you regard the mnemonics as identifiers, as there is a function in the system that relates mnemonics to opcodes.)


Appendix - Semantic error message in Topsy++

The following is an extract from the compiler frame file as supplied to you. Note that semantic error numbers start at 200. In fact they could start anywhere, save that Coco/R generates a whole lot of low-numbered message for itself, from the grammar, and you need to make sure that your error numbers don't clash with these.

char *topsyError::GetUserErrorMsg(int n)
{ switch (n) {
  // first few are extra syntax errors
    case 91:  return "Relational operator expected";
    case 92:  return "Malformed expression";
    case 93:  return "Bad declaration order";
  // remainder are constraint (static semantic) errors
    case 200: return "Constant out of range";
    case 201: return "Identifier redeclared";
    case 202: return "Undeclared identifier";
    case 203: return "Unexpected parameters";
    case 204: return "Unexpected subscript";
    case 205: return "Subscript required";
    case 206: return "Invalid class of identifier";
    case 207: return "Variable expected";
    case 208: return "Too many formal parameters";
    case 209: return "Wrong number of parameters";
    case 210: return "Invalid assignment";
    case 211: return "Cannot read this type of variable";
    case 212: return "Program too long";
    case 213: return "Too deeply nested";
    case 214: return "Invalid parameter";
    case 215: return "COBEGIN only allowed in main program";
    case 216: return "Too many concurrent processes";
    case 217: return "Only global procedure calls allowed here";
    case 218: return "Type mismatch";
    case 219: return "Unexpected expression";
    case 220: return "Missing expression";
    case 221: return "Boolean expression required";
    case 222: return "Invalid expression";
    case 223: return "Index out of range";
    case 224: return "Division by zero";
    case 225: return "Invalid for loop control variable";
    case 226: return "Mismatched for loop control variable";
    case 227: return "Invalid subscript type";
    case 228: return "Invalid array declaration";
    case 229: return "Cannot alter this variable";
    case 230: return "Can only break out of a loop";
    case 231: return "Addressfield expected";
    case 232: return "String expected";
    case 233: return "Addressfield not allowed";
    default:  return "Compiler error";
  }
}