Tutorial 14 Solution

Computer Science 301 - 2002

Tutorials for week 14 - (a) Practice in attribute grammars and code generation

1. As many of you know, my brother Peter occasionally brightens up your TV screens with good advice as to how to keep your toilets clean using Domestos and your bathroom walls beautiful with tiles from CTM. (I suspect he may teach you more useful skills than I do!) What you may not know is that he worked for State Theatre in Pretoria, which recently folded, so he is dependent more than ever before on what the SABC pay him. He has asked me to write a system to allow him to analyse the offerings of the SABC to determine the number of times he appears on screen for each company. I have got as far as the following grammar

     COMPILER SABCTV $XCN

     IGNORE CHR(1) .. CHR(31)

     PRODUCTIONS
       SABCTV      = { Programme } "Closedown" .
       Programme   = "Announcement" { QualityShow }
                     "Reminder" /* you know it's the right thing to do */ .
       QualityShow = "Programme" { Advert } .
       Advert      =   "CTMTiles" | "Domestos" | "Reminder"
                     | "VodaCom" | "Nando's" | "LGDigital" .
     END SABCTV.

Would Coco/R object to the fact that I have used the word Programme in two ways? Explain.

Not at all. Programme is the name of a non-terminal, and "Programme" is the spelling of a terminal token.

Have I produced an LL(1) grammar? If not, where not?

The production for QualityShow contains the nullable section { Advert }. The first set of this is

{ CTMTilesSym, DomestosSym, ReminderSym, VodaComSym, NandosSym, LGDigitalSym }

but { ReminderSym } is the follow set of QualityShow. So the grammar is not LL(1).

Assume for the remaining questions that any LL(1) violations are not important and that the use of the word Programme in two ways is unimportant.

How do I add the actions to be able to count his appearances?

Two solutions are shown. Note that Peter does not appear in all possible adverts!

     int CTM, DOM;  // globally declared

     SABCTV
     =                                   (. CTM = 0; DOM = 0; .)
     { Programme }
     "Closedown"                         (. printf("CTM: %d, Domestos: %d\n", CTM, DOM); .)
     .

     Programme = "Announcement" { QualityShow } "Reminder" .

     QualityShow = "Programme" { Advert } .

     Advert
     =    "CTMTiles"                     (. CTM++; .)
        | "Domestos"                     (. DOM++; .)
        | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .

     SABCTV
     =                                   (. int CTM = 0, DOM = 0; .)
       { Programme<CTM, DOM>
       } "Closedown"                     (. printf("CTM: %d, Domestos: %d\n", CTM, DOM); .)
       .

     Programme<int &c, int &d>
     = "Announcement" { QualityShow<c, d> } "Reminder" .

     QualityShow<int &c, int &d> = "Programme" { Advert<c, d> } .

     Advert<int &CTM, int &DOM>
     =    "CTMTiles"                     (. CTM++; .)
        | "Domestos"                     (. DOM++; .)
        | "Reminder" | "VodaCom" | "Nando's" | "LGDigital" .

The SABC fear that he may be suffering from over-exposure and have asked me to extend/write a system to allow them to check that he never appear in two advertisements in a row (that is, that the Domestos and CTM adverts must be separated by at least one other advert). How do I add the actions to be able to check that aspect of my brother's appearances? (Assume for this question that any LL(1) violations are not important and that the use of the word Programme in two ways is unimportant.)

Two possible solutions are suggested below.

     bool lastWasPeter;   /* global */

     SABCTV
     =                                   (. lastWasPeter = false; .)
     { Programme } "Closedown" .
     Programme   = "Announcement" { QualityShow } "Reminder" .
     QualityShow = "Programme" { Advert } .
     Advert
     =   ( "CTMTiles" | "Domestos" )     (. if (lastWasPeter) SemError(200);
                                            lastWasPeter = true; .)
       | (   "Reminder" | "VodaCom"
           | "Nando's" | "LGDigital" )   (. lastWasPeter = false; .) .

     SABCTV
     =                                   (. bool last = false, okay = true .)
     { Programme<okay, last> }           (. if (!okay) printf("Peter overexposed"); .)
     "Closedown" .

     Programme<bool &okay, bool &last>
     = "Announcement" { QualityShow<okay, last> } "Reminder" .

     QualityShow<bool &okay, bool &last>
      = "Programme" { Advert<okay, last> } .

     Advert<bool &okay, bool &lastWasPeter>
     =   ( "CTMTiles" | "Domestos" )     (. if (lastWasPeter) okay = false;
                                            lastWasPeter = true; .)
       | (   "Reminder" | "VodaCom"
           | "Nando's" | "LGDigital" )   (. lastWasPeter = false; .) .

Does your grammar check his appearances within one burst of adverts only, or across a whole evening? If you wanted it the other way around, what would you have to change?

It checks the whole evening. If you wanted to check one burst of adverts only you could do this with a system like:

     bool lastWasPeter;   /* global */

     SABCTV = { Programme } "Closedown" .

     Programme
     =                                   (. lastWasPeter = false; .)
     "Announcement" { QualityShow } "Reminder" .

     QualityShow = "Programme" { Advert } .

     Advert
     =   ( "CTMTiles" | "Domestos" )     (. if (lastWasPeter) SemError(200);
                                            lastWasPeter = true; .)
       | (   "Reminder" | "VodaCom"
           | "Nando's" | "LGDigital" )   (. lastWasPeter = false; .) .

2. Some languages, notably Pascal, Modula-2, C, C++, and Ada, allow programmers the flexibility to define what are often known as enumeration types, or simply enumerations.

     enum INSTRUMENTS { Drum, Bass, Guitar, Trumpet, Trombone, Saxophone, Bagpipe } JazzBand[41];
     enum CARS { Golf, Tazz, Sierra, BMW316 } CarHireFleet[101];  /* C or C++ */

Sometimes the variables are declared in terms of enumerations like this:

     enum COLOURS { Red, Orange, Yellow, Green, Blue, Indigo, Violet };
     COLOURS Walls, Ceiling, Roof;

The big idea here is to introduce a distinct, usually rather small, set of values which a variable can legitimately be assigned. Internally these values are represented by small integers - in the case of the CarHireFleet example the "value" of Golf would be 0, the value of Tazz would be 1, and so on.

In the C/C++ development of this idea the enumeration, in fact, results in nothing more than the creation of an implicit list of const int declarations. Thus the code

           enum CARS { Golf, Tazz, Sierra, BMW316 } CarHireFleet[101];

is semantically completely equivalent to

           const int Golf = 0; const int Tazz = 1; const int Sierra = 2, const int BMW316 = 3; 
           int CarHireFleet[101];

and to all intents and purposes this gains very little, other than possible readability; an assignment like

           CarHireFleet[N] = Tazz;

might mean more to a reader than the operationally identical

           CarHireFleet[N] = 1;

Enumerations are a "luxury" - clearly they are not really needed, as all they provide is a slightly safer way of programming with small integers. Not surprisingly, therefore, they are not found in languages like Java (simplified from C++) or Oberon (simplified from Modula-2). In Modula and Pascal they are, however, strictly typed and distinct from integers in the programmer's eye.

How would you add the ability to declare variables of such enumeration types in Parva++ programs? Consider the modifications you would need to make to the symbol table handling and to the routines for declaring variables.

As usual, I have provided a kit with the full sources for my solution on the course WWW pages as TUT14.ZIP. There must be several ways of solving the problem posed; this is the one that occurred to me. As always, I have tried to illustrate a simple approach. At the heart of this is the idea that the concept of type is modelled in the Parva system by enumerating the types themselves as a set of small integers . Each type has a unique internal "number" assigned to it - all we have to do is extend the range of these numbers and virtually everything else is already provided in the compiler for you already!

The syntactic changes we need to solve the problem of adding enumeration types to Parva are simple: We need to extend the syntax for VarDeclarations, which might conveniently introduce new non-terminals VarList and EnumType. We also need to extend the Assign production, as we can now have statements starting with an (enumeration type) identifier which introduce more declarations:

  VarDeclarations         =   ( "int" | "char" | "bool" ) VarList
                            | EnumType [ VarList ] .

  VarList                 =  OneVar { "," OneVar }  ] ";" .

  EnumType                = "enum" Ident "{" OneEnum { "," OneEnum } "}"

  OneEnum                 = Ident .

  AssignmentOrDeclaration = Designator ( "=" Expression | "++" | "--" | VarList ) .

Essentially all the EnumType parser will have to do is inject a whole lot of identifiers into the symbol table, most of them being constants with successive small integral values, generated automatically as the parse proceeds.

The secret in solving this problem neatly lies in finding a simple way to add types to the language. In the original system the four basic types were themselves defined by an enumeration. In an extended system we might replace this enumeration with a set of explicit simple integers. The header for the table class is altered to contain the declarations:

  typedef int TABLE_types;   // no longer enumeration

  const int TABLE_none  = 0; // basic types - same intrinsic values as before
  const int TABLE_ints  = 1;
  const int TABLE_chars = 2;
  const int TABLE_bools = 3;

We need to extend the TABLE_entries structure to allow for a new class of identifier - TABLE_enums:

  enum TABLE_idclasses { TABLE_consts, TABLE_vars, TABLE_progs, TABLE_enums };

At the heart of the enumeration type concept is the need to have some way of recording the maximum value that a variable can assume, so that range checking can be implemented. This becomes a property of each variable and also of the enumeration "type" itself. In the original system this was not necessary -the only limited range was for characters, and this was handled in an ad-hoc way. For simplicity all types should have this new attribute, of course:

  struct TABLE_entries {
    TABLE_alfa name;             // identifier
    TABLE_idclasses idclass;     // class
    int self;
    TABLE_types type;
    union {
      struct {
        int value;
      } c;                       // constants
      struct {
        int size, offset, max;   // -------  extended
        bool scalar;
      } v;                       // variables
      struct {                   // added
        int maxValue;
      } e;                       // -------  enumerations
    };
  };

With that background, let us examine the modifications needed to the attributed grammar:

Firstly we need to be able to extend the "range" of the typeset type, as we need to have type numbers potentially reaching fairly large values. 255 is a value for demonstration purposes only.

  typedef Set<255> typeset;                     /* ------------- extended */

Each successive new enumeration type is then distinguished by the next value in a sequence which starts at 0 (for TABLE_none) and progresses through the values 1, 2 and 3 to TABLE_bool. It is convenient to have a global integer to record the last type introduced in this way (a neater solution would hide this in the table handler module, but I was lazy). Initially this is set to TABLE_bools just before parsing commences. It is also convenient to have a set type similar to arithtypes and booltypes that can be used to accumulate the types for which the ++ and -- operations are permissible:

  int lastType;                                 /* ------------- extended */
  typeset incdectypes, arithtypes, booltypes;   /* ------------- extended */

Compatibility has to be very strict for enumerations. As it happens, the essence of type checking can all be achieved by a very simple modification to the compatible function:

  bool compatible (TABLE_types atype, TABLE_types btype)
  // returns true if atype is compatible with btype
  { return arithtypes.memb(atype) && arithtypes.memb(btype)
           || booltypes.memb(atype) && booltypes.memb(btype)
           || atype == btype;                   /* ------------- extended */
  }

The VarDeclarations parser uses a local variable max to record the largest internal value a variable may be assigned. This is then passed to the OneVar parser so that it may be recorded in the appropriate symbol table entry. Note that for integers we use the value of maxint, which can be detected as a special case by the code generator.

  VarDeclarations<int &framesize>             /* ------------- extended */
  =                             (. TABLE_types type; int max; .)
     (   "int"                  (. type = TABLE_ints; max = maxint; .)
       | "char"                 (. type = TABLE_chars; max = 255; .)
       | "bool"                 (. type = TABLE_bools; max = 1; .)
     ) VarList<framesize, type, max> WEAK ";"
     | EnumType<type, max>                    /* ------------- extended */
       [ VarList<framesize, type, max> ] WEAK ";" .

  VarList<int &framesize, int type, int max>
  =  OneVar<framesize, type, max>
     { WEAK "," OneVar<framesize, type, max> } .

The EnumType parser must return a maximum value for the type it defines and introduces, and must itself make an entry into the symbol table for the associated type name, so that this may later be recognisable if and when it is needed for further variable declarations. Note that this parser increments the lastType counter to define the effective type information for each of the constants and variables that are about to be introduced. This type is also added to the incdectypes set.

  EnumType<TABLE_types &type, int &max>       /* ------------- extended */
  =                             (. int value;
                                   TABLE_entries entry; .)
     "enum" Ident<entry.name>   (. entry.idclass = TABLE_enums;
                                   lastType++; type = lastType;
                                   entry.type = lastType;
                                   incdectypes.incl(lastType);
                                   max = -1; .)
     "{" OneEnum<max, type>
         { "," OneEnum<max, type> }
     "}"                        (. entry.e.maxValue = max;
                                   Table->enter(entry); .) .

The OneEnum parser is very simple. It makes an entry for a constant with the associated (inherited) type into the symbol table. As a side effect it is responsible for incrementing the value for each successive constant automatically:

  OneEnum<int &value, TABLE_types type>
  =                             (. TABLE_entries entry; .)
     Ident<entry.name>          (. value++; entry.c.value = value;
                                   entry.idclass = TABLE_consts;
                                   entry.type = type;
                                   Table->enter(entry); .) .

The OneVar parser is much as before, except that it needs to record the maximum value that a variable can assume. Note that if a variable is initialised then its maximum value is passed to the CGen->assign method so that the correct range checking can be effected.

  OneVar<int &framesize, TABLE_types type, int max>
  =                             (. TABLE_entries entry;
                                   TABLE_types exptype; .)
                                (. entry.idclass = TABLE_vars; entry.v.size = 1;
                                   entry.v.scalar = true; entry.type = type;
                                              /* ------------- extended */
                                   entry.v.max = max;
                                   entry.v.offset = framesize + 1; .)
     Ident<entry.name>
     [ ArraySize<entry.v.size>  (. entry.v.scalar = false; .)
     ]                          (. Table->enter(entry);
                                   framesize += entry.v.size;
                                   if (framesize > maxframe) maxframe = framesize; .)
     [ AssignOp                 (. if (!entry.v.scalar) SemError(210);
                                   CGen->stackaddress(entry.v.offset); .)
       Expression<exptype>      (. if (!compatible(entry.type, exptype)) SemError(218);
                                              /* ------------- extended */
                                   else CGen->assign(entry.v.max); .)
     ] .

The AssignmentOrDeclaration parser is easily written:

  AssignmentOrDeclaration<int &framesize>
  =                             (. TABLE_types vartype, exptype;
                                   TABLE_entries entry; .)
     Designator<classset(TABLE_vars, TABLE_enums), entry, vartype>
     (                          (. if (entry.idclass != TABLE_vars) SemError(206); .)
         AssignOp
         Expression<exptype>    (. if (compatible(vartype, exptype))
                                              /* ------------- extended */
                                     CGen->assign(entry.v.max);
                                   else SemError(218); .)
       | "++"                   (. if (incdectypes.memb(vartype))
                                              /* ------------- extended */
                                     CGen->increment(entry.v.max);
                                     else SemError(218); .)
       | "--"                   (. if (incdectypes.memb(vartype))
                                              /* ------------- extended */
                                     CGen->decrement(entry.v.max);
                                     else SemError(218); .)
       |                        (. if (entry.idclass != TABLE_enums) SemError(206); .)
           VarList<framesize, entry.type, entry.e.maxValue>
     ) WEAK ";" .

3. Brinch Hansen introduced an extended form of the WHILE loop into his language Edison:

       WhileStatement  =  "WHILE" Condition "DO" Statement
                          { "ELSE" Condition "DO" Statement }
                          "END" .

The Conditions are evaluated one at a time in the order written until one is found to be true, when the corresponding Statement is executed, after which the process of evaluating conditions is repeated. If no Condition is true, the loop terminates.

How could stack machine code generation for this be implemented? Have I produced an LL(1) grammar? If not, where not?

  WhileStatement
  =                             (. CGEN_labels startloop, testlabel, dummylabel; .)
     "WHILE"                    (. CGen->storelabel(startloop); .)
     Condition "DO"             (. CGen->jumponfalse(testlabel, CGen->undefined); .)
     Statement                  (. CGen->jump(dummylabel, startloop); .)
     {                          (. CGen->backpatch(testlabel); .)
     "ELSE Condition "DO"       (. CGen->jumponfalse(testlabel, CGen->undefined); .)
     Statement                  (. CGen->jump(dummylabel, startloop); .)
     }                          (. CGen->backpatch(testlabel); .)
     "END" .

The choice of ELSE as the internal keyword is a little awkward - consider what would happen if the internal statement was an IF statement.