Computer Science 301 - 2003

Extended Test on Prac 21 - 25 September 2003 - Solutions

Please answer all questions in the spaces provided (six sides). Text books may not be used. Max 60.

This test is mainly intended to check your understanding of important concepts in formal language theory and the construction of grammars, with some questions relating back to earlier sections. It should probably take you about an hour.

1. You must have swotted for hours for this hot spot! What are your name and student number? [1]

          Pat Terry  663T0844

2. (a) For what applications is the language Canntaireachd particularly suitable? [2]

Canntaireachd (pronounced cann-ta-rick) is the language of so-called "vocables" (suggestive syllables) used by pipers to teach each other the classical music of the Highland bagpipe (a music form known as "Ceol Mor" or "big music", also called Piobaireachd).

(b) A student, in trying to describe how to develop a compiler for Canntaireachd hosted in Java came up with the following T-diagram. It's not complete or correct - what should it have looked like? [3]

             .--------------------------.          .--------------------------.
             |        Cannt.M           |          |         Cannt.java       |
             | Cannt  --------->  Java  |          | Cannt  --------->        |
             |                          |          |                          |
             `-------.          .--------------------------.          .-------'
                     |          |         Jikes.exe        |          |
                     |  M-code  | M-Code  ------->   Java  |   Java   |
                     |          |                          |          |
                     `------------------.          .------------------'
                                        |          |
                                        |   Java   |
                                        |          |
                                        `----------'

It might be fun to have a compiler that could read Canntaireachd representations of tunes and convert them to MP3 files for playing on your computer!

             .--------------------------.          .--------------------------.
             |        Cannt.java        |          |        Cannt.class       |
             | Cannt  --------->   MP-3 |          | Cannt  --------->  MP-3  |
             |                          |          |                          |
             `-------.          .--------------------------.          .-------'
                     |          |         Jikes.exe        |          |
                     |   Java   |  Java   -------> JVMCode | JVMCode  |
                     |          |                          |          |
                     `------------------.          .------------------'
                                        |          |
                                        |  M-Code  |
                                        |          |
                                        `----------'

3. (a) What do you understand by the term "interpretive compiler"? [2]

An interpretive compiler could mean either a compiler that produces pseudo-code for a virtual machine (most likely) or a compiler that is itself hosted in a pseudo-code which has to be interpreted by a virtual machine in order for it to compile other programs

(b) Mention one advantage that is claimed for an interpretive compiler. [2]

They are much easier to write than native machine-code compilers; more compact; can be made more user-friendly; can be made more portable etc (see text book)

The code the produce typically runs 50-300 times slower than code produced by a native machine- code equivalent compiler.

4. A language is to be defined over an alphabet {a, b}. Give a regular expression that would describe

(a) possible strings in the language that would contain exactly one occurrence of a [2]

b^* a b^*

(b) possible strings in the language that would contain at least one occurrence of a [2]

(a | b)^* a (a | b)^*

5. A language L restricted by a grammar G is defined by L(G) = {N, T, S, P}. Give a concise mathematical definition (not a lengthy English description) of what is meant if one were to claim that a string s was

(a) a Sentential Form [2]

A sentential form is the goal or start symbol, or any string that can be derived from it:

Sentential form = s | S Þ^* s Ù s Î (N È T )^*

(b) a Sentence [2]

A sentence is a string derivable from S that contains only terminals.

Sentence = w | S Þ^* w Ù w Î T ^*

6. (a) If on the grammar G we impose the restrictions

S Þ^* a A b

for some strings a and b, and

A Þ^* g

for some g Î T ^* , A Î N.

would we describe the grammar as Context-Free, Context-Sensitive, Reduced, or a combination of some of these? [1]

This is the definition of a Reduced Grammar (section 6.3)

(b) Explain very briefly in English what practical effect the two restrictions above have. [4]

The first condition imposes the restriction that every non-terminal is "reachable", that is, will appear in at least one sentential form dervable from the goal symbol. the second condition imposes the restriction that every terminal is "terminating", that is, if the relevant productions are applied to it, can lead to a string consisting only of terminals.

7. An example of a regular language is defined by a grammar with the productions

            A = "a" B | "b" .
            B = "c" A | "d" .

(a) Add a production to this set that would result in the grammar being classified as context-free rather than regular (introduce other terminals and/or non-terminals if you like; the grammar does not have to describe any real language). [2]

Any production with a single non-terminal on the left and a string on the right not limited to a single terminal or a single terminal followed by a single non-terminal. For example:

C = A B "c" .

(b) Add a production to this set that would result in the grammar being classified as context-sensitive rather than regular (introduce other terminals and/or non-terminals if you like; the grammar does not have to describe any real language). [2]

Any production with more than one component on the left, but with at least one non-terminal in the string on the left. For example:

"a" B "c" = A B "c" .

Noam Chomsky

8. A student, in trying to describe mathematical expressions came up with a set of productions

            Expression = Operand { Operator Operand } .
            Operand    = "a" | "b" | "c" | "(" Expression ")" .
            Operator   = "*" | "/" | "+" | "-" .

and when asked to draw a parse tree for the expression (a * (b + c)) drew a diagram

                             Expression
                                 |
                                 |
                              Operand
                                 |
                        .--------+---------.
                        |        |         |
                       "("   Expression   ")"
                                 |
                    .------------+--------------.
                    |            |              |
                 Operand      Operator       Operand
                    |            |              |
                    |            |      .-------+-------.
                    |            |      |       |       |
                   "a"          "*"    "("  Expression ")"
                                                |
                                       .--------+--------.
                                       |        |        |
                                    Operand  Operator Operand
                                       |        |        |
                                      "b"      "+"      "c"

(a) Has the student drawn a concrete syntax tree or an abstract syntax tree? [1]

This is a concrete syntax tree.

(b) Draw the other form of tree. [3]

ASTs are much simpler!


                               .---- * ----.
                               |           |
                               |           |
                               a      .--- + ----.
                                      |          |
                                      b          c

(c) The tutor suggests that one can write a far better grammar than this. What is the reason for this claim, and what form would the productions of a better grammar take? [4]

The grammar does not distinguish the precedence of the various operators. There are various ways of improving it; the classic EBNF way is to write

            Expression = Term { AddOp Term } .
            Term       = Factor { MulOp Factor } .
            Factor     = "a" | "b" | "c" | "(" Expression ")" .
            MulOp      = "*" | "/" .
            AddOp      = "+" | "-" .

9. (a) What is meant by the statement "a non-terminal A is nullable"? [2]

It is possible to derive an empty string by applying appropriate productions to A and to the sentential forms derivable from A. Of course one may be able to produce non-empty strings as well; in general this will be the case. A non-terminal that can derive only the empty string is not very useful.

(b) In the grammar defined by the following productions, identify the nullable non-terminals [2]

            A = B "a" | C D .
            B = [ A ] .
            C = B { "a" } .
            D = "a" B .

The nullable non-terminals are B (fairly obviously) and C (less obviously, but since B is nullable and the sequence { "a" } is nullable the concatenation of the two is also nullable).

10. (a) What is meant by the statement "a grammar is ambiguous"? [2]

There must be at least one string in the language that the grammar defines which can be parsed in at least two ways. There may be many such strings in general (see the example in grammar G9 in section 6.4), but the presence of just one such string is enough to cause ambiguity in the grammar as a whole (which may be one reason why it is hard to establish whether an arbitrary, large, grammar is ambiguous).

(b) Why is the following grammar that describes a list of Roman numbers ambiguous? [2]

            Roman  = Number { "," Number }  "."  EOF .
            Number = StartI | StartV | StartX .
            StartI = "I" ( "V" | "X" | [ "I" [ "I" ] ] ) .
            StartV = "V" [ "I" ] [ "I" ] [ "I" ] .
            StartX = "X" .

Either of the strings V I I and V I can be produced in three ways

            StartV = "V" [ "I" [ "I" [ "I" ] ] ] .

11. The grammar for Parva you met in the last practical had productions reading

      Statement         =   Block | ConstDeclarations | VarDeclarations | Assignment
                          | IfStatement | WhileStatement | ReturnStatement | HaltStatement
                          | ReadStatement | WriteStatement
                          | ";" .
      ConstDeclarations = "const" OneConst { "," OneConst } ";" .
      OneConst          = identifier "=" Constant .
      VarDeclarations   = Type OneVar { "," OneVar } ";" .
      Assignment        = Designator "=" Expression ";" .
      Designator        = identifier [ "[" Expression "]" ] .
      IfStatement       = "if" "(" Condition ")" Statement .
      WhileStatement    = "while" "(" Condition ")" Statement .
      ReturnStatement   = "return" ";" .
      HaltStatement     = "halt" ";" .
      ReadStatement     = "read" "(" ReadElement { "," ReadElement } ")" ";" .
      WriteStatement    = "write" "(" WriteElement { "," WriteElement } ")" ";" .

(The other productions are not needed for this question).

(a) This grammar allows one to write statements like

            if (a > b) ;
            while (c < 10) int i;
            if (a == c) const max = 10;

which might strike you as a little pointless. Suggest how the grammar could be changed to forbid such statements. [2]

An acceptable answer would be to classify statements into two sub-categories

      Statement         = Active | Passive .
      Active            = Block | Assignment
                          | IfStatement | WhileStatement | ReturnStatement | HaltStatement
                          | ReadStatement | WriteStatement .
      Passive           = ConstDeclarations | VarDeclarations | ";" .
      IfStatement       = "if" "(" Condition ")" Active .
      WhileStatement    = "while" "(" Condition ")" Active .

This does not completely fix the "problem" - you could then abuse Block to write statements like

if (a > b) {} while (c < 10) { int i; } if (a == c) { const max = 10; }

As an interesting exercise, see if you can find a way f stamping out this sort of nonsense as well.

Most compilers/language definitions would not bother. You can be protective just so long - eventually you have to accept that a useful language will always provide ways to write meaningless programs if you really want to do so!

(b) Time to upgrade to Parva++? How would you modify the grammar to allow statements of the form [2]

            x++;
            list[i]--;

We want an LL(1) compliant solution if possible - here it is:

      Assignment        = Designator ( "=" Expression | "++" | "--" ) ";" .

12. In the land of Phool the official language is Phoolish. A Phoolish professor noticed that many of his students had not mastered the syntax of Phoolish well. Tired of correcting their many syntactical mistakes, he decided to challenge the students and asked them to write a program that could check the syntactical correctness of any sentence they wrote. Similar to the nature of Phools, the syntax of Phoolish is also pleasantly simple. Here are the rules:

0. The characters in the language are the 26 letters of the standard western alphabet and the language distinguishes between lower and upper case letters.

1. Every lower case letter is a simple correct sentence.

2. If s is a correct sentence, then so is a sentence consisting of an upper case vowel followed by s, that is As Es Is Os or Us

3. If s and t are correct sentences, then so is a sentence consisting of an upper case consonant followed by st, for example Bst Cst Dst ... Zst.

4. Rules 0 to 3 are the only rules that determine the syntactical correctness of a sentence.

Complete the following Cocol description of this language [8]

        COMPILER Phoolish  /* given in test */
        IGNORE CHR(1) .. CHR(31)
        CHARACTERS /* if necessary or useful */
        TOKENS   /* if necessary or useful */
        PRODUCTIONS
           Phoolish =
        END Phoolish.

This one is deceptively simple, once you see that it is necessary to choose character sets and token definitions appropriately!

        COMPILER Phoolish $CN
        IGNORE CHR(1) .. CHR(31)
        CHARACTERS
          letter    = "abcdefghijklmnopqrstuvwxyz" .
          consonant = "BCDFGHJKLMNPQRSTVWXYZ" .
          vowel     = "AEIOU" .
        TOKENS
          Lletter    = letter .
          Uconsonant = consonant .
          Uvowel     = vowel .
        PRODUCTIONS
          Phoolish = Lletter | Uvowel Phoolish | Uconsonant Phoolish Phoolish .
        END Phoolish.