Computer Science 3 - 2001

Programming Language Translation


Practical for Week 19, beginning 27 August 2001

Hand in this prac sheet before lunch time on your next practical day, correctly packaged in a transparent folder with your solutions and the "cover sheet". Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.


Objectives

Ja, well, no, fine. All you folks who thought this was a bit of a jorl so far, think again. But the objectives of this practical are to allow you to learn how to get Coco/R to do most of the work for you. Cheer up - it's not as bad as it might appear!


To hand in:

This week you are required to hand in, besides this cover sheet:

I do NOT require listings of any C++ code produced by Coco/R.

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 13 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box in the secretary's office for collection by Pat Terry.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so.


Task 1 Buy the kit (for free!)

In the kit PRAC19.ZIP you will find even more goodies than usual. For example, you will find the grammar for Clang, level 1, as discussed in section 8.7.2, and extended in Prac 17, in Cocol/R format ready for processing by Coco/R. You will also find an attributed grammar for a system that will analyse a set of EBNF productions to check some aspects of correctness - specifically that all non-terminals introduced seem to have been defined, and that all non-terminals appear on the right of at least one production. There are also grammars for trains, and for simple expressions.


Task 2 - Study the handout (that will cost you!)

Study the sample grammar for analysing EBNF productions to see how the attributes have been added. If you like, make the system and try it out in the usual way:

GENMAKE EBNF BORL55
MAKE -f EBNF.MAK
EBNF EBNF.TXT
EBNF EBNF.BAD

Please ask the demonstrators if there are things you don't understand in the grammar.


Task 3 - Have fun playing trains again

The file TRAIN.ATG contains a simple grammar describing trains, as discussed in prac 17 (this one does not have any "safety" regulations, which we can't express with an LL(1) grammar). The file TRAINS has some simple train patterns.

  COMPILER Train $XCN
  /* Grammar for simple railway trains
     P.D. Terry, Rhodes University, 2001 */

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)
  COMMENTS FROM "(*" TO "*)" NESTED

  PRODUCTIONS
    Train      = { OneTrain } EOF .
    OneTrain   = LocoPart GoodsPart HumanPart SYNC "." .
    LocoPart   = "loco" { "loco" } .
    GoodsPart  = Truck { Truck } .
    HumanPart  = "guard" | { "coach" } "brake" .
    Truck      = "coal" | "open" | "cattle" | "fuel" | "petrol" .
  END Train.

Go on to add actions and attributes to the grammar so that it will

Hints: For a problem as simple as this one does not really need to parameterise the parsing routines - it will suffice to store the "semantic state" in a few local variables, which, like the #include <misc.h> can be introduced at the start of the ATG file. You may wish to explore the use of the SemError facility (see pages 172-173) for placing error messages in the listing file at the point where you detect that safety regulations have been disobeyed. To do this, make a copy of the compiler frame file under the name TRAIN.FRM, that is

COPY COMPILER.FRM TRAIN.FRM

and then locate and modify the section where you are invited to add your own semantic error messages each with their own unique number - starting these numbers at about 1000 ensures that they are unlikely to collide with syntactic error messages/numbers generated by Coco/R itself (see page 175 for more details).


Task 4 - Evaluate some expressions

The file EXPLST.ATG contains a simple grammar describing expressions, of the sort found in the file DATA, with one expression evaluation per line. This grammar also has the extension used in last week's prac test to allow for "powers" to be calculated, as in the expression 23 + 4(2+3). Notice, too, how the key word SYNC has been used so that parsing can recover if the system is given incorrect expressions.

   COMPILER ExpLst $XNC

   IGNORE CHR(1) .. CHR(12)

   CHARACTERS
     digit      = "0123456789" .
     letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     eol        = CHR(13) .

   TOKENS
     number     = digit { digit } .
     identifier = letter .
     EOL        = eol .

   PRODUCTIONS
     ExpLst     = { identifier "=" Expression SYNC EOL } EOF .
     Expression = Term { "+" Term | "-" Term } .
     Term       = Factor { "*" Factor | "/" Factor } .
     Factor     = Primary [ "^" Factor ] .
     Primary    = identifier | number | "(" Expression ")" .
   END ExpLst.

Add actions to this grammar so that it will be able to evaluate the expressions. It will suffice to deal with integer arithmetic only. To start with, assume that expressions will not make use of any "variables".

Hint: If you study page 153 of the textbook it should almost immediately become obvious how to do this.

Bonus: The grammar as given allows for "variables" under the names A through Z (in other words, it suggests a calculator with 26 memory cells). How would you extend the system so that values could be stored in and retrieved from such variables?

Second bonus: Can anything go wrong in evaluating simple expressions? If so, how do you detect the fact and report it?


Task 5 - Explore the CASE statement

The file CLANG.ATG contains a grammar like the one you might have developed last week for an extended Clang system that allows for CASE statements. The data files CASE*.CLN and BADCASE*.CLN have some examples of silly programs (silly in the sense that they have no real dynamic semantics) using and abusing this construction.

The relevant part of the grammar that you should look at first is

  CaseStatement = "CASE" Expression "OF"
                    OneCase { "|" OneCase }
                    [ "DEFAULT" ":" StatementSequence ]
                  "END" .
  OneCase       = [ Range { "," Range } ":" StatementSequence ] .
  Range         = IntConst [ ".." IntConst ] .
  IntConst      = [ "+" | "-" ] number .

which defines CASE statements in a very similar way to that found in Modula-2 as it happens. You might like to look at this carefully, and ask yourself whether a simpler system like

  CaseStatement = "CASE" Expression "OF"
                    { OneCase }
                    [ "DEFAULT" ":" StatementSequence ]
                  "END" .
  OneCase       = Range { WEAK "," Range } ":" StatementSequence .
  Range         = IntConst [ ".." IntConst ] .
  IntConst      = [ "+" | "-" ] number .

would also have been acceptable (and if not, why not?). And what is the point of making OneCase nullable, for heavens sake?

First make the simple parser from this grammar, and try out the system on the various data files, and on any variations that might suggest themselves. Take a critical look at the "error recovery" that is specified by the strategically placed SYNC and WEAK directives (these are discussed in the text in detail on pages 169 through 171).

Then go on to add "static semantic" checks to the CASE statement parsing process. What, you may ask, is he on about now? Well, in a CASE statement each case label must be unique - a situation complicated in its detection slightly by the ability to declare "ranges" of labels.

A suggested way to handle this problem is to make use of some auxiliary system that can maintain simple lists. You should remember doing this sort of thing in your data structures course last year, and George Wells kindly gave me the code that he showed you in that course. You will find a slight variation on his simple template class for list handling in the file GENLIST.H, and in the file LIST.CPP is a very simple example of a program that uses this class, in case you have forgotten all that stuff!

However, if you want to make use of auxiliary classes in Coco/R generated systems you have to be careful to get the visibility of your various components correct. If you look at CLANG.ATG you will see that it has two lines at the top reading

    #include "genlist.h"
    typedef List<int> LabelList;

and these same two lines have to be added to the compiler "frame file". I have already done this to form the file CLANG.FRM, but you will have to edit CLANG.FRM further to incorporate appropriate semantic error messages if you need to do so (you do!).


  COMPILER Ebnf $XCN
  /* Describe simple EBNF productions
     P.D. Terry, Rhodes University, 2001 */

  /* We may include arbitrary text here, in particular #include directives */

  #include "miscebnf.h"

  CHARACTERS
    letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
    lowline  = "_".
    digit    = "0123456789".
    noquote1 = ANY - "'".
    noquote2 = ANY - '"'.

  IGNORE CHR(9) .. CHR(13)

  COMMENTS FROM "(*" TO "*)"  NESTED

  TOKENS
    nonterminal = letter {letter | lowline | digit} .
    terminal    = "'" noquote1 {noquote1} "'" | '"' noquote2 {noquote2} '"' .

  PRODUCTIONS
     Ebnf
     =                                     (. TABLE_InitTable() .)
       { Production }                      (. if (Successful())
                                                TABLE_TestProductions(); .)
       EOF .

     Production
     =                                     (. char Name[100]; .)
       SYNC nonterminal                    (. LexString(Name, sizeof(Name) - 1);
                                              TABLE_Add(Name, TABLE_LHS) .)
       WEAK "=" Expression SYNC "." .

     Expression = Term { WEAK "|" Term } .

     Term = [ Factor { Factor } ] .

     Factor
     =                                     (. char Name[100]; .)
         nonterminal                       (. LexString(Name, sizeof(Name) - 1);
                                              TABLE_Add(Name, TABLE_RHS) .)
       | terminal
       | "[" Expression "]"
       | "(" Expression ")"
       | "{" Expression "}" .

  END Ebnf.

------------------------------------------------------------------------------

  // Various common items for grammar checker

  #ifndef MISCEBNF_H
  #define MISCEBNF_H

  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>
  #include <stdarg.h>
  #include <ctype.h>
  #include <limits.h>

  // the usual stuff you have seen before, augmented by

  typedef enum { TABLE_LHS, TABLE_RHS } SIDES;

  typedef struct {
    char Name[50];
    int Left, Right;
  } ENTRIES;

  const int TABLE_MaxProductions = 100;
  ENTRIES TABLE_Table[TABLE_MaxProductions];
  int TABLE_NEntries;

  static void TABLE_InitTable() {
  // Initialise table
    TABLE_NEntries = 0;
  }

  static void TABLE_Add (char *N, SIDES Where) {
  // Add a nonterminal name N to the table, according as to Where in a
  // production it was found
    if (TABLE_NEntries > TABLE_MaxProductions) {
      fprintf(stderr, "Table Overflow"); exit(1);
    }
    strcpy(TABLE_Table[TABLE_NEntries + 1].Name, N); // sentinel
    TABLE_Table[TABLE_NEntries + 1].Left = 0;
    TABLE_Table[TABLE_NEntries + 1].Right = 0;
    int i = 1;
    while (strcmp(N, TABLE_Table[i].Name)) i++;
    if (i > TABLE_NEntries) TABLE_NEntries++; // new entry
    if (Where == TABLE_LHS) TABLE_Table[i].Left++; else TABLE_Table[i].Right++;
  }

  static void TABLE_TestProductions() {
  // Check that all non terminals have appeared once on the left side of
  // each production, and at least once on the right side of each production
    bool OK = TRUE; // optimistic, even if he is a bastard!
    int i;
    for (i = 1; i <= TABLE_NEntries; i++) {
  //  printf("%s %d %d\n", TABLE_Table[i].Name, TABLE_Table[i].Left, TABLE_Table[i].Right);
      if (TABLE_Table[i].Left == 0) {
        OK = false;
        printf("%s never defined\n", TABLE_Table[i].Name);
      }
      else if (TABLE_Table[i].Left > 1) {
        OK = false;
        printf("%s defined more than once\n", TABLE_Table[i].Name);
      }
    }
    for (i = 2; i <= TABLE_NEntries; i++) {
      if (TABLE_Table[i].Right == 0) {
        OK = false;
        printf("%s cannot be reached\n", TABLE_Table[i].Name);
      }
    }
    if (OK) printf("Seems to be reduced grammar\n");
    else printf("Cannot be reduced grammar\n");
  }

  #endif /* MISCEBNF_H */

------------------------------------------------------------------------------

  /* Simple generic linked list class.  Only the interface is shown here
     George Wells  --  27 February 1996 */

  template<class T> class List
    { public:
        List (void);                              // Constructor
        List (const List& lst);                   // Copy constructor
        ~List (void);                             // List destructor
        void add (T item, int position = MAXINT); // Place new item in a List
        void remove (int position);               // Remove item at position in List
        int length (void);                        // Return number of elements in List
        T& operator[] (int index);                // Subscript a List
        int position (T item);                    // Find item in a List
        int isMember (T item);                    // True if item is in List
    }; // class List


Home  © P.D. Terry