Test for week 3

Computer Science 301 - 2016 - Test on Week 3 - Solutions

This should take no longer than 25 minutes and is simply intended to probe whether you have understood the material in the prac questions. YOU MAY NOT USE YOUR TEXTBOOK, NOTES OR A COMPUTER. Answer on the questions sheet (3 sides).

This test was very badly done in general (with some notable exceptions). Please study the answers carefully.

1. You were expecting this one, I bet! What are your name (surname) and student number? [1]

Pat (Terry) 63T0844

2. What do you understand by the term equivalent grammars? Give a short, precise definition! [2]

There is a very simple, concise answer to this, so come to appreciate it!

Two grammars G1 and G2 are equivalent if they define exactly the same language (set of strings).

That is all you really need to say. You might also clarify or express this mathematically by

L(G1) ≡ L(G2) where G1 = { N1, T, S, P1 } and G2 = { N2, T, S, P2 }

It is not necessary that N1 ≡ N2 or that P1 ≡ P2, but of course both grammars must be defined over the
same alphabet T, and have the same goal G.

3. Part of the grammar for the extended Parva language you met in the practical might have been expressed as follows:

   PRODUCTIONS
     Parva            = "void" "main" "(" ")" Block .
     Block            = "{" { Statement } "}" .
     Statement        = (   Block | WhileStatement | IfStatement | DoWhileStatement
                          | ReadStatement | WriteStatement | Assignment
                          | ReturnStatement | EmptyStatement
                          | ConstDeclaration | VarDeclarations
                        ) .
     IfStatement      = "if" "(" Condition ")" Statement
                         { "elsif" "(" Condition ")" Statement }
                         [ "else" Statement ] .
     WhileStatement   = "while" "(" Condition ")" Statement .
     DoWhileStatement = "do"  Statement  "while" "(" Condition ")" ";" .

What (if anything) would be the effect of replacing the productions as follows? Are the productions equivalent? If not, what are the differences in practice? [5]

     Parva            = "void" "main" "(" ")" Block .
     Block            = "{" { Declaration | Statement } "}" .
     Declaration      = ConstDeclaration | VarDeclarations .
     Statement        = (   WhileStatement | IfStatement | DoWhileStatement
                          | ReadStatement | WriteStatement | Assignment
                          | ReturnStatement | EmptyStatement
                        ) .
     IfStatement      = "if" "(" Condition ")" Block
                         { "elsif" "(" Condition ")" Block }
                         [ "else" Block ] .
     WhileStatement   = "while" "(" Condition ")" Block .
     DoWhileStatement = "do" Block "while" "(" Condition ")" ";" .

Very few submissions got this right, so it is useful to comment on some of the misconceptions. The main point that was quite often missed is that the second set of productions forces you to have braces around the subsidiary statement (or statements) in if, while and dowhile statements. You have to write, for example

if (a == b) { c = d; } else { c = f; }

rather than the simpler, C-like code

if (a == b) c = d; else c = f;

which is allowed by the first set of productions. The first set does not forbid writing the braces, of course, as Statement is still able to derive Block. However, an important property is that by requiring the braces one eliminates the "dangling else" ambiguity and non-LL(1) problem from which the first set of productions suffers. If you don't believe me, try submitting the grammars to Coco/R. Another point of interest is that Statement can no longer derive Block, so that blocks cannot be nested - rather silly code like

{ a = b; c = d; { e = f; } g = h; }

would be allowed by the first set of productions, but not by the second set.

Wrong answers received might have included:

(a) "The second set requires you to declare variables before you use them". Alas, no. Both sets are completely equivalent in that respect, and allow declarations and statements to be freely mingled.

In general, productions like

     A = B | C .
     B = "x" | "y" .
     C = "p" | "q" .

and

     A = "x" | "y" | "p" | "q" .
     A = "y" | "p" | "x" | "q" .

or indeed, any other permutations of these, do not force any ordering at all.

(b) "The first set allow you to write meaningless code like

if (a == b) int c;

with only a declaration instead of a statement". Well, yes, but the same apparently useless/meaningless code is allowed by the second set as

if (a == b) { int c; }

Few languages will prevent the dedicated idiots among us from writing meaningless or misleading code, though I suggest C++ is worse in this respect than some other languages I could mention. What do you make of the statements in this gem, which compiles quite happily and illustrates several potentially hard to find bugs?

   int silly() { return (int) silly + silly(); }

   void main () {
     int a, b;
     if (a);           while (silly);         while (silly());
     while (a) {}      if (a) 3;              if (a) a+b;
     if (a) int x; else int y;                42;      -6;
     silly;            silly();               a+++--b;
     ++a + + + - - --b;
   }

4. Show how one might specify a numeric literal in Cocol, allowing for the possibility of numbers of any of the following forms [4]

      123   1.23   1.   .23   1.2E12  1.2E-12   1E6   1E-6   .234E6   .234E-6

Some students had not appreciated two key ideas:

(a) We are looking for a TOKENS definition of a numeric literal, not a PRODUCTIONS definition. A numeric literal is a single token formed of contiguous characters, with no spaces permitted within it (the sequences of tokens permitted and defined by PRODUCTIONS definitions can have spaces between them).

(b) One cannot define one token in terms of others. This point was made in lectures, but probably not emphasized enough.

All of this means that we have to find one (big) regular expression-like way of specifying the pattern of a numeric literal, and, since there are many permissible forms of this, the definition is, indeed, quite complicated. Here are two ways of doing it:

    CHARACTERS
      digit  = "0123456789" .

    TOKENS
      number =   "." digit { digit } [ ( "e" | "E" ) [ "+" | "-" ] digit { digit } ]
               | digit { digit } [ "." { digit } ] [ ( "e" | "E" ) ["+" | "-"] digit { digit } ] .

or

    TOKENS
      number =   ( "." digit { digit } | digit { digit } [ "." { digit } ] )
                    [ ( "e" | "E" ) ["+" | "-"] digit { digit } ] .

It takes some skill (and experience) to make sure that one gets all the options correct, while ruling out a definition that allows totally erroneous numbers like 1.2.3.45, for example. Here is a definition that looks seductively correct:

    TOKENS
      number = [ digit { digit } ] [ "." { digit } ] [ ( "e" | "E" ) ["+" | "-"] digit { digit } ] .

but this allows a "number" to be completely "empty" - devoid of any characters at all.

5. Modify the productions given as the first set in Question 3 so as to add a SwitchStatement to Parva on the lines of the ones found in C# or Java. A (silly) example of such a statement is as follows: [6]

         switch (x + 10) {
           case 10 :
             write("reached case 10"); return;
           case 20:
           case 30:
             if (x != 20) write("x must be 10");
             else write("x = " , x); break;
           default :
             write("x cannot be 0, 10 or 20");
             break;
         } // switch

Here are two attempts at a solution. I have deliberately refrained from commenting further, so that you can have the benefit of an extra exercise. Are these two sets of productions for SwitchStatement equivalent (if not, why not)? What can you say about their correctness (does either define anything other than a syntactically correct SwitchStatement or do either or both allow syntactically incorrect SwitchStatements to be defined?) Can you think of a syntactically correct SwitchStatement that cannot be defined by one or other or both sets of productions? What is the minimal SwitchStatement that they seem to allow? Really?

     Statement        = (   Block | WhileStatement | IfStatement | DoWhileStatement
                          | ReadStatement | WriteStatement | Assignment
                          | ReturnStatement | SwitchStatement | EmptyStatement
                        ) .

     SwitchStatement  = "switch" "(" Expression ")" "{"
                          { CaseLabelList Statement { Statement } }
                          [ "default" ":" { Statement } ]
                        "}" .
     CaseLabelList    = CaseLabel { CaseLabel } .
     CaseLabel        = "case" [ "+" | "-" ] Constant ":" .
     Constant         = IntConst | CharConst | "true" | "false" | "null" .

             SwitchStatement  = "switch" "(" Expression ")" "{"
                                  { OneSwitch }
                                "}" .
             OneSwitch        = CaseLabel ":" { CaseLabel ":" } Statement { Statement } .
             CaseLabel        = "case" Constant | "default" .
             Constant         = IntConst | CharConst | "true" | "false" | "null" .

6. In the practical you were invited to develop a grammar that would describe a list of academic staff, with provision made for titles, initials, names and degrees. A set of productions for this might have been the following

    PRODUCTIONS
      Staff        = { Person } EOF .
      Person       = [ Title ] FullName { "," Degree } "."  .
      FullName     = NameLast { NameLast } .
      NameLast     = initial { initial } name | name .
      Title        = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" | "Mx" .
      Degree       =   "BA" | "BSc" | "BCom" | "BMus" | "BEd" | "BSocSci"
                     | "BSc(Hons)" | "BCom(Hons)" | "MA" | "MSc" | "PhD" .
    END Staff.

(a) Give a precise, mathematical, definition of the concept "A is a nullable non-terminal". [2]

As in the case of question 2, there is a very simple concise answer. Mathematically the definition of "A is nullable" can be expressed:

A ⇒^* ε

In English: It is possible to produce (derive) an empty string by applying appropriate productions to A and to the sentential forms derivable from A.

Of course one may be able to produce (derive) non-empty strings as well. In general this will be the case, as a non-terminal that can derive only the empty string is not very useful.

(b) Identify - giving reasons - the nullable non-terminals in the above set of productions (name and initial are, of course, terminals). [4]

The non-terminals are Staff, Person, Fullname, NameLast, Title and Degree. Title and Degree are clearly not nullable. Don't confuse this with the fact that it is possible to have a valid name without a Title or Degree - once you derive from either Title or Degree you must produce a single terminal! NameLast must derive at least one terminal - name - and similarly FullName must derive at least one name. Person must derive at least a name and a period. That leaves only Staff. In one sense Staff is nullable - because EOF is not really a token, just a convenient way of detecting the end of the input. Scanners will, of course, be primed to return a pseudo-terminal if they run out of input, but a completely empty list would be acceptable to this grammar (everybody gone on holiday?).

The answers received showed up a misconception that many of you have. Consider the production

          A = [ B ] { C } .
          B = x | y .
          C = p | q .

Here A is certainly nullable. Its right side has a sequence of two "nullable" components (or "deletable structures" as Coco calls them). But B and C themselves are not nullable: compare this with an equivalent grammar

              A = B  C  .
              B = [ x | y ] .
              C = { p | q } .

Here A is nullable. Its right side is a sequence with two components B and C. Both of these are now themselves nullable. The same would hold for yet another equivalent grammar:

              A = B  C  .
              B =   x | y
                  | /* eps*/ .
              C =   ( p | q ) C
                  | /* eps*/ .

(c) The PRODUCTIONS section above makes use of the EBNF "meta brackets" { } and [ ] in production rules. How would you write alternative productions for Staff, Person, FullName and NameLast which made no use of parentheses ( ) or of either of the { } or [ ] form of meta brackets or of the Kleene Closure symbol * ? [6]

This was not always well done, yet the transformations required are always easy. If you have a right hand side like

P = Something { Option } SomethingElse .

introduce a new nonterminal RepeatedOptions in place of { Option }

and then rewrite things as

P = Something RepeatedOptions SomethingElse .
RepeatedOptions = Option RepeatedOptions | ε .

Similarly, and even more simply, if you have something like

Q = Startoff [ OneOption ] Finish .

introduce a new nonterminal Possible in place of [ OneOption ]

and then rewrite things as

Q = Startoff Possible Finish .
Possible = OneOption | ε .

With that idea, the PRODUCTIONS section given above might be rewritten mechanically as

    PRODUCTIONS
      Staff          = Persons EOF .
      Persons        = Person Persons | .
      Person         = OptionalTitle FullName Degrees "."  .

      OptionalTitle  = Title | .
      Title          = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" | "Mx" .

      FullName       = NameLast OtherLastNames .
      OtherLastNames = NameLast OtherLastNames | .
      NameLast       = initial MoreInitials name | name .
      MoreInitials   = initial MoreInitials | .

      Degrees        = "," Degree Degrees | .
      Degree         =   "BA" | "BSc" | "BCom" | "BMus" | "BEd" | "BSocSci"
                       | "BSc(Hons)" | "BCom(Hons)" | "MA" | "MSc" | "PhD" .
    END Staff.

But this could also be refactored rather more radically and expressed

    PRODUCTIONS
      Staff6         = Persons EOF .
      Persons        = Person Persons | .
      Person         = OptionalTitle FullName Degrees "."  .

      OptionalTitle  = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" | "Mx" |  .

      FullName       = NameLast OtherLastNames .
      OtherLastNames = NameLast OtherLastNames | .
      NameLast       = Initials name .
      Initials       = initial Initials | .

      Degrees        = "," Degree Degrees | .
      Degree         =   "BA" | "BSc" | "BCom" | "BMus" | "BEd" | "BSocSci"
                       | "BSc(Hons)" | "BCom(Hons)" | "MA" | "MSc" | "PhD" .
    END Staff6.

or even as follows (this is still an LL(1) grammar). (These are all in the solution kit.)

    PRODUCTIONS
      Staff7         = Person Staff7 | EOF .
      Person         = OptionalTitle FullName Degrees SYNC "."  .

      OptionalTitle  = "Professor" | "Dr" | "Mr" | "Mrs" | "Ms" | "Miss" | .

      FullName       = NameLast OtherNames .
      OtherNames     = FullName | .
      NameLast       = Initials name .
      Initials       = initial Initials | .

      Degrees        = "," Degree Degrees | .
      Degree         =   "BA" | "BSc" | "BCom" | "BMus" | "BEd" | "BSocSci"
                       | "BSc(Hons)" | "BCom(Hons)" | "MA" | "MSc" | "PhD" .
    END Staff7.