Computer Science 3 - 2006

Programming Language Translation


Practical for Week 24, beginning 9 October 2006

Hand in your solutions to this practical before lunch time on your next practical day, correctly packaged in a transparent folder with your cover and individual assessment sheets. Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker. Please make it clear whose folder you have used for the electronic submission, for example g03A1234. Please resist the temptation to carve up the practical, with each group member only doing one task. The group experience is best when you work on tasks together.


Objectives:

Ja, well, no, fine. All you folks who thought this was a bit of a jorl so far, or have been thinking that IS projects are an excuse for back-sliding - think again. In this practical you are to

You will need this prac sheet and your text book. As usual, copies of the prac sheet are also available at http://www.cs.ru.ac.za/CSc301/Translators/trans.htm.


Outcomes:

When you have completed this practical you should understand


To hand in:

This week you are required to hand in, besides the cover sheets (one per group member):

I do NOT require listings of any Java code produced by Coco/R.

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated in our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box outside the laboratory and not given to demonstrators.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another group's or student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so. You are expected to be familiar with the University Policy on Plagiarism, which you can consult at:

        http://www.scifac.ru.ac.za/plag.htm 


Task 1 - creating a working directory and unpacking the prac kit

There are several files that you need, zipped up this week in the file PRAC24.ZIP (Java version) or PRAC24C.ZIP (C# version)


Task 2 - First steps to a Boolean calculator

In the kit you will find Bool.atg. This is a (partly) attributed grammar for a Boolean calculator, based on the grammar you might have developed in Practical 22, and which can store values in any of 26 memory locations, inspiringly named A through Z.

   import Library.*;
   import java.util.*;

   COMPILER Bool $CN
   /* Boolean expression calculator
      P.D. Terry, Rhodes University, 2006 */

     static boolean[] mem = new boolean[26];

   IGNORECASE

   CHARACTERS
     letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
   TOKENS
     variable   = letter .

   COMMENTS FROM "(*" TO "*)"  NESTED
   COMMENTS FROM "/*" TO "*/"  NESTED

   IGNORE CHR(0) .. CHR(31)

   PRODUCTIONS
     Bool                               (. int index = 0;
                                           boolean value = false;
                                           char var;
                                           for (int i = 0; i < 26; i++) mem[i] = false; .)
     = { Variable<out var>              (. index = var - 'A'; .)
         "=" Expression                 (. mem[index] = value;
                                           IO.writeLine(value); .)
         ";"
       } EOF .

     Variable<out char var>
     = variable                         (. var = token.val.charAt(0); .)
     .

     Expression = Term { Or Term } .
     Term       = Factor { [ And ] Factor } .
     Factor     = "NOT" Factor | Primary { "'" } .
     Primary    = True | False | variable | "(" Expression ")" .
     True       = "TRUE" | "1" .
     False      = "FALSE" | "0" .
     And        = "AND" | "&&" | "." .
     Or         = "OR" | "||" | "+" .
   END Bool.

Use Coco/R to generate and then compile source for this complete calculator. You do this most simply by

cmake Bool

A command like

crun Bool bool.txt

will run the program Bool and try to parse the file bool.txt, sending error messages to the screen. Giving the command in the form

crun Bool bool.bad -L

will send an error listing to the file listing.txt, which might be more convenient. Try this out.

Study the grammar carefully. If there are any aspects of it that you do not understand, please ask one of the tutors to explain them.


Task 3 - Complete the calculator

Of course, the calculator still only parses expressions, it does not evaluate them. Complete the attributes for the other non-terminals, so as to define a complete calculator.


Task 4 - A better calculator

Now make the following extensions to the system:

Test your system out thoroughly - give it both correct and incorrect data.


Task 5 - A CSC 201 assembler lister

The next few tasks are all related to various aspects of parsing and analysing programs written in the CSC 201 toy assembler language, for which you were invited to write a Cocol specification in Practical 22.

One possible grammar for this language is given below:

   import Library.*;
   import java.util.*;

   COMPILER ASM $NC
   /* Define Assembler language as used in CSC 201
      Comments defined as tokens
      P.D. Terry, Rhodes University, 2006 */

   public static OutFile output;

   IGNORECASE

   CHARACTERS
     lf         = CHR(10) .
     cr         = CHR(13) .
     control    = CHR(0) .. CHR(31) .
     letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     digit      = "0123456789" .
     binDigit   = "01" .
     hexDigit   = digit + "ABCDEFabcdef" .
     charCh     = ANY - "'" - control .
     printable  = ANY - control .

   TOKENS
     number     = digit { digit } | binDigit { binDigit } "%" | digit { hexDigit } "H" .
     identifier = letter { letter | digit } .
     string     = "'" charCh { charCh } "'" .
     EOL        = cr lf | lf .
     comment    = ";" { printable } .

   PRODUCTIONS
     ASM               = Begin StatementSequence End .
     Begin             = { OptionalComment } "BEG" OptionalComment .
     End               = "END" { OptionalComment } .
     OptionalComment   = [ comment ] SYNC EOL .
     StatementSequence = { Statement OptionalComment } .
     Statement         = [ Label ] [ OneByteOp | TwoByteOp Address ] .
     OneByteOp         =    "ASR"  | "CLA"  | "CLC"  | "CLV"  | "CLX"  | "CMA"  | "CMC"
                          | "DEC"  | "DEX"  | "HLT"  | "INA"  | "INB"  | "INC"  | "INH"
                          | "INI"  | "INX"  | "NOP"  | "OTA"  | "OTB"  | "OTC"  | "OTH"
                          | "OTI"  | "POP"  | "PSH"  | "RET"  | "SHL"  | "SHR"  | "TAX" .
     TwoByteOp         =    "ACI"  | "ACX"  | "ADC"  | "ADD"  | "ADI"  | "ADX"  | "ANA"
                          | "ANI"  | "ANX"  | "BCC"  | "BCS"  | "BGE"  | "BGT"  | "BLE"
                          | "BLT"  | "BNG"  | "BNZ"  | "BPZ"  | "BRN"  | "BVC"  | "BVS"
                          | "BZE"  | "CMP"  | "CPI"  | "CPX"  | "DC"   | "DS"   | "EQU"
                          | "JGE"  | "JGT"  | "JLE"  | "JLT"  | "JSR"  | "LDA"  | "LDI"
                          | "LDX"  | "LSI"  | "LSP"  | "ORA"  | "ORG"  | "ORI"  | "ORX"
                          | "SBC"  | "SBI"  | "SBX"  | "SCI"  | "SCX"  | "STA"  | "STX"
                          | "SUB"  .
     Address           = Term { '+' Term | '-' Term } .
     Term              = Label | IntConst | StringConst | '*' .
     Label             = identifier .
     IntConst          = number .
     StringConst       = string .
   END ASM.

Task 5 is to enhance this grammar with actions that will copy the assembler source program to an output file with each of the label, opcode, address and comment fields neatly lined up. Essentially one has simply to write each terminal as it is parsed. Lining up the fields calls for some care, as not all fields are found on each line. However, the aim should be to allow messy code like

                             ; P.D. Terry, 2006
             BEG  ; Count the bits in a number
             CLA        ; CPU.A := 0
         STA  BITS      ; BITS := 0
             INI                    ; Read(CPU.A)
     LOOP            ; REPEAT

to be listed more neatly as follows (this sort of output is often called "pretty-printing")

                             ; P.D. Terry, 2006
             BEG             ; Count the bits in a number
             CLA             ; CPU.A := 0
             STA  BITS       ; BITS := 0
             INI             ; Read(CPU.A)
     LOOP                    ; REPEAT

Hints:

(a) It will be a good idea to edit the Driver.frame file, and from this to create a customized ASM.frame file that will allow the parser to direct its output to a new file whose name is derived from the input file name by a change of extension (copy Driver.frame to ASM.frame and edit it, reading the embedded instructions and hints carefully).

(b) On page 255 there is a discussion of how one may write a production that associates an action with an empty option, and you may find this very useful. For example, we might have a section of grammar that amounts to

A = B [ C ] D .

which might be suitably attributed

          A = B [ 
                  C (. action when C is present .) 
                ] D . 

but in some situations we might prefer or need

          A = B (   C (. action when C is present .) 
                  |   (. action when C is absent .) 
                ) D . 

(c) You might be tempted to get the desired spacing on the output by arranging for "tabs" to be inserted at suitable points. These have the disadvantage that tab settings differ from one output medium or library to another, and what might look pretty in some situations won't look pretty in others. It is better to use the "fixed width" output routines in the Library. (Look them up on the web page!)


Task 6 - Some constraint checks

The grammar given earlier has the virtue of being LL(1), but in fact it does not impose various constraints on assembler programs that we really need. These are

(a) an EQU directive is meaningless unless a label is present:

               MAX  EQU  100      ; correct
                    EQU  34       ; incorrect and useless

(b) an ORG directive should not be allowed to have a label:

                    ORG  100      ; correct
               LAB  ORG  100      ; incorrect

(c) a string constant of length greater than one should only be permitted as an Address field that follows a DC directive, and in that case the Address field can consist of that string only: :

                    LDI  'a'            ; correct
                    LDI  'a' + 'b'      ; correct
                    LDI  'abcde'        ; incorrect
                    DC   'Tyrant'       ; correct
                    DC   'Pat' + Sally' ; incorrect
                    DC   1 + 'Fred'     ; incorrect
                    DC   'Thabo' + 2    ; incorrect

Modify the attributed grammar to handle these constraints.


Task 7 - A cross reference generator for assembler programs

A cross reference generator for the ASM language is a program that analyses an assembler source program and then prints a list of the labels in it, along with the line numbers where they appeared. A cross reference listing can be extremely useful when you are asked to maintain very big programs. Such a listing, for the BITS.ASM program in the kit, might look like the one below (negative numbers denote the lines where the label appeared in the "label" field and positive numbers where it appeared in an "address" field):

   START                -2
   BITS                  3    9   11   14  -17
   LOOP                 -5   13
   EVEN                  7  -13
   STOP                 -16

Modify the ASM grammar and provide the necessary support routines (that is, add a simple symbol table handler) to be able to generate such a listing.

Hints:

(a) This should turn out to be a lot easier than it at first appears. You will notice that the ASM grammar makes references to labels in only two places, so with a bit of thought you should be able to see that there are in fact very few places where the grammar has to be attributed. When you have thought about the implications of that hint, check out your ideas with a tutor, so as not to spend fruitless hours writing far more code than you need.

(b) You will, however, have to develop a symbol table handler. This can make use of the ArrayList class in two ways. You will need a list of records of the identifiers. For each of these records you will also need a list of records of the line numbers. A class like the following might be a useful one for this purpose:

  class Entry {
    public String name;             // the label's name
    public ArrayList refs;          // its list of line numbers
    public boolean defined;         // true if it has appeared in a label field
    public boolean referenced;      // true if it has appeared in an address field

    public Entry(String name, boolean defined) {
      this.name = name;
      this.refs = new ArrayList();
      this.defined = defined;
      this referenced = false;
    }
  } // Entry

and your Table class (a skeleton of which is supplied in the file ASM\Table.java) could be developed from the following ideas:

  class Table {

    public static void clearTable() {
    // Resets (empties) the table

    public static void addRef(String name, boolean defining, int lineRef) {
    // Enters name if not already there, adds another line reference (negative if
    // it appears in the label field in the source text) and
    // updates the status of the label (referenced or defined)
    public static void printTable() {
    // Prints out all references in the table

  } // Table

You will be relieved to hear that each of these methods can be implemented in only a few lines of code, provided you think clearly about what you are doing.

(d) What may have escaped your attention is that "support" classes (like Table.java) needed by an application (like ASM) must be stored in a sub-directory whose name matches the goal symbol (like ASM).


Task 8 - Consistent use of labels in assembler programs

It is quite easy to make the following errors when coding in assembler

(a) redefining a label by using it in more than one "label" field;

(b) referring to a label in an "address field", but which is never defined in a "label" field;

(c) defining a label in a "label" field that is never referenced in an "address" field (this is not necesasrily incorrect, but may relate to carelessness).

These errors are exemplified in the following otherwise pointless code:

                  BEG
                  BRN   LIFE     ; LIFE is defined twice
           LOOP   INI            ; LOOP is never referenced
                  ADI   24
                  CMP   TOTAL    ; TOTAL is defined and referenced
                  BNZ   LIFE
                  BRN   AGAIN    ; AGAIN is not defined
           LIFE   HLT
           TOTAL  DS    1
           LIFE   DC    42
                  END

Modify your system further to be able to detect cases (a) and (b) as errors and to draw attention to (c) in the cross reference listing or as a compiler warning.

Hint: This exercise can also be used to highlight a further application of the StringBuffer class that you used in Practical 23. Use an object of this type to build up the lists of labels that turn out to be "undefined", or "not referenced", as shown in the example above.

Yes, I know, this exercise can be done using classes other than ArrayList and StringBuffer. However, these classes are used in various illustrations in the course, and it is as well for you to be properly familiar with them.


Appendix - simple use of the ArrayList class

The following code shows how the ArrayList class can be used to construct a list of records of people's names and ages, and then to display this list and interrogate it. Note the use of casting operations! You can compile and run the program at your leisure, as it is in the prac kit.

  import Library.*;
  import java.util.*;

  class ShowArrayList {
  // Demonstrate simple application of the ArrayList class

    static class Entry {
      public String name;
      public int age;                                     // other fields could be added

      public Entry(String name, int age) {                // constructor
        this.name = name;
        this.age = age;
      }
    } // Entry

    static ArrayList list = new ArrayList();              // global for simplicity here!

    public static int position(String name) {
    // Finds position of entry with search key name in list, or -1 if not found
      int i = list.size() - 1;                            // index of last entry
      while (i >= 0 &&                                    // short-circuit protection
             !name.equals(((Entry)list.get(i)).name))     // must cast before extracting field
        i--;                                              // will reach -1 if no match
      return i;
    }

    public static void main (String[] args) {
    // Build a list of people's names and ages
      IO.writeLine("Supply a list of people's names and ages.  CTRL-Z to terminate");
      do {
        String name = IO.readWord();
        int age = IO.readInt();
        IO.readLn();
        if (!IO.eof()) list.add(new Entry(name, age));    // add to end of list
      } while (!IO.eof());

      IO.writeLine(list.size() + " items stored");        // report size of list

      StringBuffer sb = new StringBuffer();               // demonstrate StringBuffer use
      for (int i = 0; i < list.size(); i++) {             // display each entry
        Entry e = (Entry) list.get(i);                    // retrieve (via a cast) an item at position i
        IO.write(e.name, -16); IO.writeLine(e.age);       // -16 means "left justify"
        sb.append(e.name + " ");                          // add the names to a StringBuffer object
      }
      IO.writeLine();

      int where = position("Pat");                        // find the silly fellow!
      if (where < 0) IO.writeLine("Pat not found");
      else {
        Entry pat = (Entry) list.get(where);
        IO.writeLine("Pat found at position " + where + ". He is " + pat.age + " years old");
      }

      if (sb.length() > 0) {
        IO.writeLine();
        IO.writeLine("Summary of names:");
        IO.writeLine();
        IO.writeLine(sb.toString());
      }
    }

   } // ShowArrayList


Home  © P.D. Terry