Practical 12

Computer Science 3 - 2002

Programming Language Translation

Practical for Week 12, beginning 6 May 2002

This prac is due for submission by lunch time on your next practical day, correctly packaged in a transparent folder as usual. Pracs should please be deposited in the hand-in box outside the lab. Only one set of listings is needed for each group, but please enclose as many copies of the cover sheet as are needed, one for each member of the group. These will be returned to you in due course.

Objectives

The objectives of this practical are to allow you to implement a compiler for a slightly extended version of Parva, by bootstrapping from an existing compiler for Parva, level 1.0. This compiler aims to compile Parva source programs into code for a hypothetical stack machine like that described in chapter 4.4 of the text.

Hopefully after doing these exercises (and studying the attributed grammar and the various other support modules carefully) you will find you have learned a lot more about compilers and programming languages than you ever did before (and, I suspect, a lot more than undergraduates at any other university in this country!) I also hope that you will have begun to appreciate how useful it is to be able to base a really large and successful project on a clear formalism - namely the use of attributed context-free grammars - and will have learned to appreciate the use of sophisticated tools like Coco/R.

To hand in:

This week you are required to hand in, besides the cover sheet:

Listings at least of your attribute grammar and code generator, and also the table handler and stack machine emulator if you change the supplied ones. Listings of test programs would be useful too.
A diskette, with your grammar and source files, and preferably some test programs.

I do NOT require listings of any C++ code produced by Coco/R.

Before you begin

The tasks are presented below in an order which, if followed, should make the practical an enjoyable and enriching experience. Please do not try to leave everything to the last few hours, or you will come horribly short. You must work consistently, and with a view to getting an overview of the entire project, as the various components and tasks all interact in ways that will probably not at first be apparent. Please take the opportunity of coming to consult with me at any stage if you are in doubt as how best to continue. By all means experiment in other ways and with other extensions if you feel so inclined.

This version of Parva has been restricted so as not to include functions. Depending on how far we get and what we do next week, this may mean that there could be no practical work set on chapters 16 and 17 of the text. Because of the timing of our courses this is unavoidable, if highly regrettable. You should be warned that some of the material of these two chapters is "examinable".

A note on test programs

Throughout this project you will need to ensure that the features you explore are correctly implemented. This means that you will have to get a feel for understanding code sequences produced by the compiler. The best way to do this is usually to write some very minimal programs, that do not necessarily do anything useful, but simply have one, or maybe two or three statements (the object code for which is easily predicted and understood).

When the Parva compiler has finished compiling a program, say SILLY.PAV, you will find that it creates a file SILLY.COD in which the stack machine assembler code appears. Studying this is often very enlightening.

Useful test programs are small ones like the following. There are some specimen test programs in the kit, but these are deliberately incomplete, wrong in place, too large in some cases and so on. Get a feel for developing some of your own.

    $D+ // Turn on debugging mode
    void main (void) {
      int i, List[10];
      while (1 = 1) { // infinite loop, can generate an index error
        read(i);
        List[i] = 100;
      }
    }

The debugging pragma

It is useful when writing a compiler to be able to produce debugging output - but sometimes this just clutters up a production quality compiler. The PARVA.ATG grammar makes use of the PRAGMAS option of Coco/R (see text, page 166) to allow a line like that shown to have the desired effect.

$D+ // Turn diagnostic mode on for testing the compiler

Task 1 Buy the kit (for free!)

In the kit PRAC12.ZIP you will find far more goodies than ever before.

Task 2 - I dare you - try it out!

Use GENMAKE to make yourself an appropriate MAKE file, as usual, and then go ahead to make the Parva compiler from PARVA.ATG as supplied, and compile it. Apply the compiler to the specimen programs VOTE.PAV and BAD.PAV. Are the results as you expect?

Task 3 - Study the system first (that will cost you!)

In getting to understand the system it is recommended that you take some time to study it before leaping in to modify it. For example:

(a) In Clang a declaration like

VAR List[10];

specifies the upper range of an index, and is equivalent to Pascal's

VAR List : ARRAY [0 .. 10] OF INTEGER;

In Parva, however, a declaration like

int List[10];

specifies the size of the array, not the upper limit on an index, and would be closer to Pascal's

VAR List : ARRAY [0 .. 9] OF INTEGER;

So what on earth would happen if you submitted a declaration like

int x[0];

to the Parva compiler? Does it mean anything to have an array of size 0? What? What does it mean in C++? Is it allowed in C++?

(b) Study how the compiler handles escape sequences like \t and \a and \n that appear in string literals. Does it handle them all correctly? If not, where or why not?

(c) Study carefully how the compiler handles declarations of identifiers local to inner block statements - Consider, for example, programs like

    $D+ // Turn diagnostic mode on for testing the compiler
    // The dynamic semantics of this program are not really meaningful!

    void main (void) {
      int i;                 // i is global
      { int j, k = 10;       // j and k are local to this inner statement
        j = 2 * k;
        int l = j + k;       // l is also local to this statement
        read(i);             // i is still in scope
        if (i > 0) {
          l = k;             // l and k are still in scope
          int m = 90;        // m is local to the body of the if statement
        }
      }
      i = 2 * j + m;         // i is in scope, but j and m are not
    }

    $D+ // Turn diagnostic mode on for testing the compiler
    // The dynamic semantics of this program are not really meaningful!

    void main (void) {
      int i;                 // i is global
      { int i, j, k = 10;    // j and k, and a new i, are local to this statement
        j = 2 * k;
        int l = j + k;       // l is also local to this inner statement
        read(i);             // the second i might be the one in scope?
        if (i > 0) {         // the second might still be the one in scope?
          l = k;             // l and k are still in scope
          int m = 90;        // m is local to the body of the if statement
          bool i = true;     // is yet another i allowed?
        }
        if (i > 0) {
          bool i = true;     // yet another i?
          int m = 90;        // yet another m?
        }
      }
    }

What offsets (addresses) from the base pointer CPU.BP are associated with each variable?

(d) In the second example above it appears that we can have more than one variable named in existence at run time with the name i, and that these variables can even be of different types. Does this really make sense? If so, how do the compiler and interpreter work out which i is which?

(e) In the definition of the symbol table you will find an enumeration of the possible types that variables and expressions can have, like:

enum TABLE_types { TABLE_none, TABLE_ints, TABLE_bools, TABLE_chars };

Surely this is ridiculous? We can only declare variables to be of int, bool or char types. So why is there a none type in the enumeration?

(f) In the Parva manual you will find that a Block is defined as

Block = "{" { Declaration | Statement } "}" .

but in the PARVA.ATG file you will find

Block = "{" { Statement } "}" .

with the production for Statement augmented to include the forms of Declaration. Is there something wrong with the ATG file, or with the manual, with both, or with neither?

Task 4 - Better use of the debugging pragma

We have already commented on the $D pragma. At present it is only used to request the printout of a symbol table. How would you change the system so that one would have to use the pragma if one wanted to obtain the assembler code file - so that the ".COD" file of assembler code is only produced in debugging mode? Also add a pragma to allow you to turn debugging mode off.

$D- /* Turn debugging mode off */

Another useful (run-time) debugging aid is the stackdump statement (see text, page 223). Compilation of this could be controlled by the same pragma (in other words, the stack dumping code would only be generated in debug mode).

(Most of the time you are testing your compiler you will probably be working in "debugging" mode, I expect).

Hint: These changes are almost trivially easy.

Task 5 - Learning multiple languages can be frustrating

One of the nastier mistakes we all probably make in C++ is to write code like

if (a = b) Statement;

when of course we probably meant to write

if (a == b) Statement;

This "feature" of C++ is so often a "bug" that many C++ compilers issue a warning if they come across code like that in the first example (they can only warn, as the code is not incorrect, and, yes, folks, as someone once said "C++ is its own virus"). What does Parva do at present? Can you find a way to be more user friendly? Can you think of similar ways ways of being gentle to people who migrate from Pascal to Parva and confuse Pascal's := operator and Parva's =, confuse Pascal's <> with Parva's != and omit the parentheses round the Condition in the IfStatement and WhileStatement.

Hint: This is quite easy too! Experiment with the WEAK marker as well.

Task 6 - Friendlier array declarations

Extend Parva to allow array declarations also to be made in terms of predeclared constants, for example

    const
      Max = 100;
    int
      List[Max], List2[200];

Syntactically it's trivial. But how do you implement it?

Task 7 - More on scope rules

The compiler as supplied allows you to write code like this

    void main (void) {
      int i, j, k;
      if (i > 0) {
        int i, j, k;
      }
      if (i < 0) {
        int a, b;
      }
      if (i < 0) {
        int b, a;
      }
    }

But in Java you would not be allowed to write some of that code, as the scope rules in Java are different from those in C++. Modify the compiler and/or the symbol table handler to impose the Java rules rather than the C++ rules. Which rules strike you as being "better" for programmers, and why?

Task 8 - The return statement

In a mutifunctional Parva a return statement would be used to leave a void function "early" or to return a value from an int, char or bool function. Those last ones can can wait for another day, but add the simple return statement to Parva so that if it is encountered the program will cease execution.

    if (n > max) {
      print("limit exceeded"); return;
    }

Task 9 - Prompts in read statements

Add the ability to issue prompts in read statements, as exemplified by

    read("Supply Age ", age1, " Supply partner\'s age", age2);

Task 10 - Something to do - while you wait for a tutor

Add the do - while loop to Parva, as exemplified by

    do { a = b; c = c + 10; } while (c < 100);

Task 11 - You had better do this one or else....

Add the else option to the if statement. Oh, yes, it is trivial to add it to the grammar. But be careful. Some if statements will have else parts, others may not, and the code generator has to be able to produce the correct code for whatever form is actually to be compiled. The following silly examples are all valid.

    if (a == 1) { c = d; }
    if (a == 1) {}
    if (a == 1) {} else {}
    if (a == 1) {} else { b = 1; }

Task 12 - The character data type

The compiler as supplied has not, in fact, incorporated the char data type. As a rather larger exercise, extend Parva to allow variables and constants to be declared of this type, values to be read and written and used in expressions, and so on.

You will find various machine level instructions and code generating functions have already been defined and these should be useful in doing this task, so take some time to look at the code generator interface (lines 50, 59, 95, 100, for example) and the stack machine emulator.

Task 13 - Make the change; enjoy life; upgrade now to Parva`++` (Ta-ra!)

At last! Let's really make Parva useful and turn it into Parva++ by adding the increment and decrement statement forms exemplified by

    int parva, list[10];
    char ch;
    ...
    parva++;
    ch--;
    list[10]--;

Hint: The code generator (see lines 87 - 93) and stack machine already incorporate suitable opcodes for use here. The sort of code you need to generate for the first two of the above statements would be

    ADR -1
    PPP
    ADR -12
    MMR