Computer Science 3 - 2000

Test 2 on Translators - 25 October 2000 solutions - C++ version

Please answer all questions 0 ... 6, in the spaces provided (six sides). Text books may not be used.

0. What are your name (surname!) and student number? [ 1 mark ]

1. Formally, a grammar G is a quadruple { N, T, S, P } with the four components

(a) N - a finite set of non-terminal symbols,
(b) T - a finite set of terminal symbols,
(c) S - a special goal or start or distinguished symbol,
(d) P - a finite set of production rules or, simply, productions.

where a production relates to a pair of strings, say α and β, specifying how one may be transformed into the other:

α → β α ∈ (N ∪ T )^* N (N ∪ T )^* , β ∈ (N ∪ T )^*

and formally we can define a language L(G) produced by a grammar G by the relation

L(G) = { σ | σ ∈ T ^* ; S ⇒^* σ }

In terms of this notation, express precisely (that is to say, mathematically; we do not want a long essay or English description) what you understand by [ 3 marks each ]

(a) A context-sensitive grammar

The number of symbols in the string on the left of any production is less than or equal to the number of symbols on the right side of that production. In fact, to qualify for being of type 1 rather than of a yet more restricted type, it is necessary for the grammar to contain at least one production with a left side longer than one symbol.

Productions in type 1 grammars (context-sensitive) are of the general form

α → β with | α | ≤ | β | , α ∈ (N ∪ T )^* N (N ∪ T )^* , β ∈ (N ∪ T )⁺

In another definition, productions are required to be limited to the form

αAβ → αγβ with α, β ∈ (N ∪ T )^*, A ε N⁺, γ ∈ (N ∪ T )⁺

although examples are often given where productions are of a more general form, namely

αAβ → ζγξ with α, β, ζ, ξ ∈ (N ∪ T )^*, A ∈ N⁺, γ ∈ (N ∪ T )⁺

(b) FIRST(A) where A ∈ N

The FIRST set is best defined by

a ∈ FIRST(A) if A ⇒⁺ aζ (A ∈ N ; a ε T ; ζ ∈ (N ∪ T ) ^* )

This is bookwork, and you should know these sorts of definitions!

2. For the last week or two you have been creating and running a Topsy compiler. Complete the set of T- diagrams below so as to represent the process of producing the compiler and then compiling a simple program like the VOTE.TOP that formed part of the prac kit. [6 marks ]

                                                                   ┌──────────────┐                ┌──────────────┐
                                                                   │   VOTE.TOP   │                │   VOTE.STK   │
                                                                   │              │                │              │
 ┌──────────────────────────┐          ┌──────────────────────────┐└────┐    ┌────┴────────────────┴────┐    ┌────┘
 │         TOPSY.ATG        │          │        TOPSY.CPP         │     │TOP │        TOPSY.EXE         │STK │
 │ TOPSY  ──────────>  STK  │          │ TOPSY ───────────>  STK  │     └────┤ TOPSY ───────────>  STK  ├────┤
 │                          │          │                          │          │                          │    │  STKMC.EXE
 └───────┐          ┌───────┴──────────┴───────┐          ┌───────┴──────────┴───────┐          ┌───────┤INT │ Interpreter
         │          │          CR.EXE          │          │          BCC.EXE         │          │    ┌──┴────┴──┐
         │  Cocol   │ Cocol  ────────>    C++  │   C++    │  C++   ────────>   80x86 │   80x86  │    │   80x86  │
         │          │                          │          │                          │          │    └──────────┘
         └──────────┴───────┐          ┌───────┴──────────┴───────┐          ┌───────┴──────────┘
                            │          │                          │          │       ┌──────────┐
                            │   80x86  │                          │   80x86  │       │   80x86  │
                            │          │                          │          │       └──────────┘
                            └──────────┘                          └──────────┘
                            ┌──────────┐                          ┌──────────┐
                            │   80x86  │                          │   80x86  │
                            └──────────┘                          └──────────┘

2a (Follow up question) A few days ago you developed a system that would allow you to convert programs written in Clang to the equivalent programs written in Topsy. Complete the set of T-diagrams below so as to represent the process of producing the converter as a program CLN2TOP and then using this to convert the simple program VOTE.TOP that formed part of the prac kit to VOTE.TOP. Hint: You may need to draw other/more blocks than those shown here. [8 marks ]

                                                                   ┌──────────────┐                ┌──────────────┐
                                                                   │   VOTE.CLN   │                │   VOTE.TOP   │
                                                                   │              │                │              │
 ┌──────────────────────────┐          ┌──────────────────────────┐└────┐    ┌────┴────────────────┴────┐    ┌────┘
 │         CL2TOP.ATG       │          │         CL2TOP.CPP       │     │    │         CL2TOP.EXE       │    │
 │ CLANG  ──────────> TOPSY │          │ CLANG ───────────> TOPSY │     └────┤ CLANG ───────────> TOPSY ├────┘
 │                          │          │                          │          │                          │
 └───────┐          ┌───────┴──────────┴───────┐          ┌───────┴──────────┴───────┐          ┌───────┘
         │          │          CR.EXE          │          │          BCC.EXE         │          │
         │  Cocol   │ Cocol  ────────>    C++  │   C++    │  C++   ────────>   80x86 │   80x86  │
         │          │                          │          │                          │          │
         └──────────┴───────┐          ┌───────┴──────────┴───────┐          ┌───────┴──────────┘
                            │          │                          │          │       ┌──────────┐
                            │   80x86  │                          │   80x86  │       │   80x86  │
                            │          │                          │          │       └──────────┘
                            └──────────┘                          └──────────┘
                            ┌──────────┐                          ┌──────────┐
                            │   80x86  │                          │   80x86  │
                            └──────────┘                          └──────────┘

3. By rephrasing them so as to eliminate the metabrackets or otherwise, analyse the following set of productions, and explain clearly where they do or do not conform to the rules for LL(1) parsing. Assume that the identifers used for children's names are all distinct, unique terminals. [8 marks ]

      Home = Family { Pets } [ Transport ] "house" .
      Pets = "dog" [ "cat" ] | "cat" .
      Transport = "car" | "bicycle" .
      Family = Parents { Child } .
      Parents = "Dad" "Mom" | "Mom" "Dad" | "Dad" | "Mom" .
      Child = identifier .

Eliminating the meta brackets causes few problems:

      Home      = Family Animals Transport "house"               (1)
      Animals   = Pets Animals | ε .                             (2)
      Pets      = "dog" Moggie | "cat" .                         (3)
      Moggie    = "cat" | ε .                                    (4)
      Transport = "car" | "bicycle" | ε .                        (5)
      Family    = Parents Children .                             (6)
      Children  = Child Children | ε .                           (7)
      Parents   = "Dad" "Mom" | "Mom" "Dad" | "Dad" | "Mom" .    (8)
      Child     = identifier .                                   (9)

The rules with non-empty alternatives are (3), (5) and (8). (3) and (5) clearly satisfy the LL(1) rule 1. (8) clearly does not as Parents₁ and Parents₃ both have "Dad" in their FIRST set; similarly Parents₂ and Parents₄ both have "Mom" in their first sets.

The nullable non-terminals are Animals, Moggie, Transport and Children. For these we must look at the FIRST and FOLLOW sets.

Clearly

FIRST(Animals) = FIRST(Pets) = { "cat", "dog" }
FIRST(Moggie) = { "cat" }
FIRST(Transport) = { "car", "bicycle" }
FIRST(Children) = { identifier }

while

FOLLOW(Animals) = FIRST(Transport) U "house" = { "car", "bicycle", "house" }
FOLLOW(Moggie) = FOLLOW(Pets) = FIRST(Animals) U FIRST(Transport) U "house"
= { "cat", "dog", "car", "bicycle", "house" }
FOLLOW(Transport) = { "house" }
FOLLOW(Children) = FOLLOW(Family) = { "cat", "dog", "car", "bicycle", "house" }

So Rule 2 is broken for Moggie. Put another way, a sequence like

"dog" "cat" "cat"

can be parsed in two ways - the first "cat" might have come from Moggie producing "dog cat", or directly from another application of Pets₂.

Write an equivalent grammar that does conform to the LL(1) rules. [ 4 marks ]

      Home = Family { Pets } [ Transport ] "house" .
      Pets = "dog"  | "cat" .
      Transport = "car" | "bicycle" .
      Family = Parents { Child } .
      Parents = "Dad" [ "Mom" ] | "Mom" [ "Dad" ] .
      Child = identifier .

4. A Keen Type has decided to extend Clang to allow programmers to incorporate exponentiation into their expressions, for example, writing x^2 + y^3 to mean x² + y³. She has had two Bright Ideas. One is to change the productions to

     Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } .
     Term       = Factor { MulOp Factor } .
     Factor     = Primary [ "^" Expression ] .    // --------------- here
     Primary    = Designator | number | "(" Expression ")" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

and the other is to make the change

Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Primary [ "^" Factor ] . // --------------- here Primary = Designator | number | "(" Expression ")" . AddOp = "+" | "-" . MulOp = "*" | "/" .

Explain to her which is the better choice, giving reasons, perhaps illustrated by example. [ 5 marks ]

The second choice is better. The first one is ambiguous, and non-LL(1). I had hoped that people would see this by realising that the expression x^2 + y^3 would appear to mean x^(2 + y³) since the "2 + y^3" is itself a complete Expression.

The follow up test invited you to investigate this further. If we rewrite the first grammar to eliminate the metabrackets we get

     Expression = ( "+" Term | "-" Term | Term ) TailExp .
     TailExp    = AddOp Term TailExp | ε .
     Term       = Factor TailTerm .
     TailTerm   = MulOp Factor TailTerm | ε .
     Factor     = Primary TailFactor .
     TailFactor = "^" Expression | ε .
     Primary    = Designator | number | "(" Expression ")" .
     AddOp      = "+" | "-" .
     MulOp      = "*" | "/" .

The nullable nonterminals here are TailExp, TailTerm and TailFactor.

FIRST(TailExp) = { "+" , "-" }
FIRST(TailTerm) = { "*" , "/" }
FIRST(TailFactor) = { "^" }

The FOLLOW sets are a little harder to see because to get to closure one has to chase through quite a few other productions:

FOLLOW(TailExp) = FOLLOW(Expression)
FOLLOW(TailTerm) = FOLLOW(Term) = FIRST(TailExp) U FOLLOW(Expression)
FOLLOW(TailFactor) = FOLLOW(Factor) = FIRST(TailTerm) U FOLLOW(Term)

You are invited to track these through in detail; the outcome is that they are all the same:

FOLLOW(TailExp) = { "*" , "/" , "+" , "=" , ")" }
FOLLOW(TailTerm) = { "*" , "/" , "+" , "=" , ")" }
FOLLOW(TailFactor) = { "*" , "/" , "+" , "=" , ")" }

and so Rule 2 is broken for TailExp and for TailTerm.

In the follow-up test you were invited to demonstrate the ambiguous nature of the grammar by considering the expression 4^5*6. This can indeed be parsed in two ways, one with the implicit meaning of 4^(5*6) and the other with the meaning of (4^5)*6. The parse trees would look like this (a few intermediate nodes have been omitted to save space, but most students correctly gave the trees in full detail

           Expression                                        Expression
               │                                                 │
             Term                                              Term
               │                                                 │
               │                                       ┌─────────┼────────┐
               │                                       │         │        │
            Factor                                  Factor      "*"     Factor
               │                                       │                  │
        ┌──────┼─────────┐                             ├──┬──────┐        │
        │      │         │                             │  │      │        │
     Primary  "^"   Expression                    Primary ^ Expression    │
        │                │                             │         │        │
        │               Term                           │       Term       │
        │         ┌──────┼─────┐                       │         │        │
        │         │      │     │                       │         │        │
     Number     Factor  "*"  Factor                Number    Factor       │
        │         │            │                       │         │        │
        │         │            │                       │         │        │
        4         5            6                       4         5        6

When she finally comes to implement the Brighter Idea, she is advised to do so by adding a special interpreter opcode to the system, say MC_pow. Always ready for a challenge, she tries writing code of the form

        case STKMC_pow:
        // POP TOS and SOS, raise SOS to power of TOS, push power to form new TOS
          cpu.sp++;
          if (inbounds(cpu.sp))
          { int power = 1;
            for (int i = 1; i <= mem[cpu.sp - 1]; i++) power *= mem[cpu.sp];
            mem[cpu.sp] = power;
          }
          break;

To her dismay this does not work properly. Why not? You do not have to suggest correct code, simply point out why this code does not work. [ 5 marks ]

The reason is that if one wants to determine X^Y where Y is negative, one has to remember that X^-Y = 1/X^Y. The simple FOR loop will not work in this case. In fact, for an integer machine we should remember that X^Y = 0 if X = 0, and that if Y is negative 1/X^Y will always truncate to zero. So the way to write the code would be

        case STKMC_pow:
        // POP TOS and SOS, raise SOS to power of TOS, push power to form new TOS
          cpu.sp++;
          int power = 1;
          if (inbounds(cpu.sp))
          { if (mem[cpu.sp] == 0 || mem[cpu.sp - 1] < 0) power = 0;
            else for (int i = 1; i <= mem[cpu.sp - 1]; i++) power *= mem[cpu.sp];
            mem[cpu.sp] = power;
          }
          break;

Many students seemed to think the code was incorrect because it would compute Y^X rather than X^Y, but that is not the case, and this is a little worrying. Stack code for computing X^Y would be of the form

             PSH -X
             PSH -Y
             POW

and at the instant the POW was to be executed the relevant portion of memory would look like this

        ──┬──────┬─────┬─────┬─
          │  Y   │  X  │     │
        ──┴──────┴─────┴─────┴─
             ^
             |
           cpu.sp

The operation cpu.sp++ would move the stack pointer to point at X, and the value of Y (the exponent) would then be stored at mem[cpu.sp-1] - remember that we grow (push onto) the stack in this system by decrementing cpu.sp, in common with most real machine architectures.

5. How many times have you cursed yourself for writing code like

      for (i = 0; i < 10; i++);        or      if (x == 10);
        { /* loop body */                        { /* some code */
        }                                        }

and not noticed the semicolon at the end of the line which should not have been there? [ 0 marks! ]
Okay, let's fix this problem. Below you can see most of the grammar for Statements in Topsy. How would you modify this so that it would not willingly accept code like the above? At the same time, one must be able to write ridiculous statements with no "body" if we really want to do so, so your solution should still allow a way of doing this. Hint: this can either be done by writing a whole lot of special syntax, or using a semantic constraint. Try doing it using a simple static semantic constraint, that is, add (a very small number of) actions/attributes/parameters as necessary to the syntax as given. [ 5 marks ]

Statement = Block | Assignment | IfStatement | WhileStatement | ForStatement | DoWhileStatement | ReadStatement | WriteStatement | ReturnStatement | EmptyStatement | ConstDeclaration | VarDeclarations<framesize> Block = "{" { Statement } "}" . Assignment = Variable ( "=" Expression | "++" | "--" ) ";" . IfStatement = "if" "(" Condition ")" Statement [ "else" Statement ] . WhileStatement = "while" "(" Condition ")" Statement . DoWhileStatement = "do" Statement "while" "(" Condition ")" ";" . ForStatement = "for" "(" identifier "=" Expression ";" Condition [ ";" Epilogue ] ")" Statement . Epilogue = identifier ( "=" Expression | "++" | "--" ) . ReadStatement = "cin" ">>" Variable { ">>" Variable } ";" . WriteStatement = "cout" "<<" WriteElement { "<<" WriteElement } ";" . ReturnStatement = "return" ";" . EmptyStatement = ";" .

This is a nice example where the use of "inheriting" an attribute, rather than synthesizing one, provides a quick solution. The insight needed is to see that in the "context" of parsing an IfStatement, WhileStatement or ForStatement, the sub-parser responsible for parsing the secondary Statement can be told that EmptyStatements are not acceptable. This we do by simply passing the subparser an appropriate Boolean flag. The altered productions look like this:

     Statement<bool AllowEmpty>
     =    Block            | Assignment      | IfStatement
        | WhileStatement   | ForStatement    | DoWhileStatement
        | ReadStatement    | WriteStatement  | ReturnStatement
        | ConstDeclaration | VarDeclarations<framesize>
        | EmptyStatement      (. if (!AllowEmpty) (. SemError(1000); .)

     Block = "{" { Statement<true> } "}" .
     IfStatement = "if" "(" Condition ")" Statement<false> [ "else" Statement<false> ] .
     WhileStatement = "while" "(" Condition ")" Statement<false> .
     DoWhileStatement = "do" Statement<true> "while" "(" Condition ")" ";" .
     ForStatement
     =  "for" "(" identifier "=" Expression ";" Condition [ ";" Epilogue ] ")"
        Statement<false> .

6. A Smart Alick, faced with learning to program in Topsy pointed out to me that he thought there really was no need for the special ConstDeclaration production. After all, he said, in C++, variable declarations can just incorporate the modifier const to produce exactly the effect one needs. Here are some variable declarations and statements he showed me to make his point:

         int c, d, e = 10, f = 12;                // real "variable" variables
         const int X = 12;  const char CH = 'a';  // "constant" variables

         /* later in the program */
         X = e;   // should be forbidden
         e = X;   // should be allowed

This intrigued me, so I decided to give it a whirl, and after a few simple modifications to the code below came up with what looked as though it would satisfy Mr Smart Alick.
(a) What did I have to change, and how did I change it? (make changes on the code) [ 5 marks ]

Block = "{" { ConstDeclarations | VarDeclarations<framesize> | Statement } "}" . ConstDeclaration = (. int value; TABLE_entries entry; .) "const" Ident<entry.name> (. entry.idclass = TABLE_consts; .) WEAK "=" ( Number<value> (. entry.c.value = value; entry.type = TABLE_ints; .) | Char<value> (. entry.c.value = value; entry.type = TABLE_chars; .) | "true" (. entry.c.value = 1; entry.type = TABLE_bools; .) | "false" (. entry.c.value = 0; entry.type = TABLE_bools; .) ) WEAK ";" (. Table->enter(entry); .) . VarDeclarations<int &framesize> = (. TABLE_types type; .) ( "int" (. type = TABLE_ints .) | "char" (. type = TABLE_chars .) | "bool" (. type = TABLE_bools .) ) OneVar<framesize, type> { WEAK "," OneVar<framesize, type> } WEAK ";" . OneVar<int &framesize, TABLE_types type> = (. TABLE_entries entry; TABLE_types exptype; .) (. entry.idclass = TABLE_vars; entry.v.size = 1; entry.v.scalar = true; entry.type = type; entry.v.offset = framesize + 1; .) Ident<entry.name> [ ArraySize<entry.v.size> (. entry.v.scalar = false; .) ] (. Table->enter(entry); framesize += entry.v.size; if (framesize > maxframe) maxframe = framesize; .) [ "=" (. if (!entry.v.scalar) SemError(210); .) Expression<exptype> (. if (!compatible(entry.type, exptype)) SemError(218); else CGen->store(entry.v.offset, entry.type == TABLE_chars); .) ] . ArraySize<int &size> = "[" Number<size> (. if (size == 0) SemError(228); .) "]" . // Handle symbol table for Topsy level 1 parser const int tablemax = 100; // max number of identifiers const int TABLE_alfalength = 15; // maximum length of identifiers typedef char TABLE_alfa[TABLE_alfalength + 1]; enum TABLE_idclasses { TABLE_consts, TABLE_vars, TABLE_progs }; enum TABLE_types {TABLE_none, TABLE_ints, TABLE_chars, TABLE_bools }; struct TABLE_entries { TABLE_alfa name; // identifier TABLE_idclasses idclass; // class int self; TABLE_types type; union { struct { int value; } c; // constants struct { int size, offset; bool canchange, scalar; } v; // variables }; }; class TABLE { public: void enter(TABLE_entries &entry); // Adds entry to symbol table void search(char *name, TABLE_entries &entry, bool &found); // Searches table for presence of name. If found then returns entry void protect(TABLE_entries entry); // Alters entry to prevent the variable from being altered void unprotect(TABLE_entries entry); // Alters entry to allow the variable to be altered again void openscope(int framesize); // Opens a scope record at the start of parsing a statement block void closescope(int &framesize); // Closes a scope record at the end of parsing a statement block TABLE(REPORT *R); // Initializes symbol table };

There are various ways of doing this. Essentially we aim to knock ConstDeclarations out altogether:

  Block =  "{" { VarDeclarations<framesize> | Statement } "}" .

Then we can change the syntax of VarDeclarations to allow an optional "const" to appear:

  VarDeclarations = [ "const" ] ( "int" | "char" | "bool" ) OneVar { "," OneVar } ";" .

However, we need to pass an appropriate parameter to the OneVar parser so that this can "inherit" the property that needs to be recorded in the symbol table:

  VarDeclarations<int &framesize>
  =                             (. TABLE_types type; bool isConst = false; .)
     [   "const"                (. isconst = true; .) ]
     (   "int"                  (. type = TABLE_ints .)
       | "char"                 (. type = TABLE_chars .)
       | "bool"                 (. type = TABLE_bools .)
     )
     OneVar<framesize, type, isconst> { WEAK "," OneVar<framesize, type, isconst> } WEAK ";" .

Finally, the OneVar parser can either mark the symbol table entry's idclass or canchange field. (You might like to consider whether there is any advantage to be had one way or the other.) We illustrate the use of the idclass field below. If the variable is to be marked as constant we shall also have to insist on the defining expression being present:

  OneVar<int &framesize, TABLE_types type, bool isConst>
  =                             (. TABLE_entries entry;
                                   TABLE_types exptype; .)
                                (. if (isConst) entry.idclass = TABLE_consts;
                                   else entry.idclass = TABLE_vars;
                                   entry.v.size = 1;
                                   entry.v.scalar = true; entry.type = type;
                                   entry.v.offset = framesize + 1; .)
     Ident<entry.name>
     [ ArraySize<entry.v.size>  (. entry.v.scalar = false; .)
     ]                          (. Table->enter(entry);
                                   framesize += entry.v.size;
                                   if (framesize > maxframe) maxframe = framesize; .)
     (   "="                    (. if (!entry.v.scalar) SemError(210); .)
         Expression<exptype>    (. if (!compatible(entry.type, exptype)) SemError(218);
                                   else CGen->store(entry.v.offset, entry.type == TABLE_chars); .)
       |                        (. if (isConst) SemError(1001); // must be defined .)
     ] .

(b) How would my system have reacted to a further declaration (after those above) of the form [ 5 marks ]

const int Z = X * c;

The above scheme would, of course accept it. It would generate the code for performing the assignement, and then flag the identifier Z as constant.

This raises interesting questions - suppose we have code like this:

         void main (void) {
           for (int i = 1; i < 10; i++)  {
             const int z = i; cout << z;
           }
         }

Should it be allowed; and what, if anything, should it produce as output? Surely each time through the loop it is effectively changing a variable flagged as constant? Think about it, try it out on your favourite C++ compiler, and comment on or explain the results!