Test for week 21

Computer Science 3 - 2011 - Test on Week 21

Please answer all questions in the spaces provided (three sides). Text books may not be used. Max 35.

This test is intended to check your understanding of important concepts in formal language theory and the construction of grammars, with some questions relating back to earlier sections. It should probably take you about half an hour.

1. Who says it's hard to spot for my tests? What are your name and student number? [1]

Pat Terry 63T0884

2. A language L restricted by a grammar G is defined by L(G) = {N, T, S, P}.

(a) What do you understand by the terms Sentential Form and Sentence? [2]

A sentential form is the goal or start symbol, or any string that can be derived from it:

Sentential form = s | S Þ^* s Ù s Î (N È T )^*

A sentence is a string derivable from S that contains only terminals.

Sentence = w | S Þ^* w Ù w Î T ^*

(b) What do you understand by the terms Terminal and Non-Terminal? [2]

Terminals are the atomic (indivisible) elements of a language that appear in sentences. Non- terminals are useful descriptors of syntactic elements that are useful in describing the "phrase structure" of sentences - ideas like "function call" or "variable declaration" if one is talking about computer languages, or "qualified noun" or "dependent clauses" if one is talking about English sentences.

In terms of L(G) we simply have that a is a terminal if a Î T and A is a non-terminal if A Î N.

3. (a) What is meant by the statement "a grammar is ambiguous"? [2]

There is a very simple way to express this - a grammar is ambiguous if at least one sentence in the language that it defines can be parsed in at least two ways (that is can have more than one parse tree rooted in the goal symbol and with identical leaf nodes). Note that we are talking about a single grammar - we are not talking about two equivalent but distinct grammars that might each be able to describe the same sentences.

(b) Demonstrate that the following grammar that describes phrases like "lucky girl" or "poor silly twisted boy" is ambiguous. [4]

            Phrase     = Adjectives Noun.
            Adjectives =   Adjectives Adjectives
                         | "silly" | "twisted" | "lucky" | "clever" | "poor" | e .
            Noun       = "boy" | "girl" .

To answer this sort of question you must be able to draw two parse trees (or specify two distinct expansions from the goal symbol in some other way) that lead to the same sentence. There are many examples one can use here, and most people could come up with one. The real menace here is that the option

Adjectives = Adjectives Adjectives

can lead to another string like

Adjectives = Adjectives Adjectives Adjectives

in two ways - either by applying the option again to the leftmost Adjectives or to the rightmost Adjectives. So here is an example where we can draw two distinct trees:

                                      Phrase                                     Phrase
                             .-------------------.                .-------------------------.
                             |                   |                |                         |
                        Adjectives              Noun          Adjectives                  Noun
                    .------------------.         |         .----------------.               |
                    |                  |         |         |                |               |
                Adjectives          Adjectives   |      Adjectives      Adjectives          |
               .-----------.           |         |         |          .-----------.         |
               |           |           |         |         |          |           |         |
             Adjectives Adjectives     |         |         |       Adjectives Adjectives    |
               |           |           |         |         |          |           |         |

            "poor"      "silly"    "twisted"   "boy"    "poor"      "silly"    "twisted"   "boy"

This is really very easy! One solution is:

            Phrase     = { Adjectives } Noun.
            Adjectives = "silly" | "twisted" | "lucky" | "clever" | "poor" .
            Noun       = "boy" | "girl" .

Be careful not to write, as one might be tempted to do

            Phrase     = { Adjectives } Noun.
            Adjectives = "silly" | "twisted" | "lucky" | "clever" | "poor" | e.
            Noun       = "boy" | "girl" .

which is actually ambiguous once more. Curly braces around { Adjectives } already implies an e option, and you don't need/want to have two ways of doing this!

Another grammar might be a recursive one on the lines of one suggested in the book somewhere:

            Phrase     = Adjective Phrase | Noun.
            Adjective  = "silly" | "twisted" | "lucky" | "clever" | "poor" .
            Noun       = "boy" | "girl" .

4. A grammar for expressions like the one you extended in the practical should be familiar

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = "a" | "b" | "c" | "(" Expression ")" .

Suppose we also wanted our grammar to be able to describe expressions in which entities could be raised to a power - an operation denoted by "^", and exemplified by a ^ b or (a + b) ^ c or a ^ b ^ c * d .

The exponentiation operator (as "^" is called) is a high precedence operator and associates from right to left so that a ^ b ^ c means a ^ (b ^ c) and not (a ^ b) ^ c. Suggest how this operator would be incorporated in an extension of the grammar above. [5]

The hint was given that this is a high precedence operator. It's higher than * and /, but lower than (), and the correct solution is to introduce another level into the hierarchy. To get the correct "right associativity" requires an option in a right recursive production

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = Primary [ "^" Factor ] .
             Primary    = "a" | "b" | "c" | "(" Expression ")" .

Now as an interesting exercise you might like to ponder why you cannot write, instead

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = Primary [ "^" Expression ] .
             Primary    = "a" | "b" | "c" | "(" Expression ")" .

and you might also like to ponder whether the following alternative is acceptable:

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = { Primary "^" } Primary .
             Primary    = "a" | "b" | "c" | "(" Expression ")" .

5. (a) What is meant by the statement "a non-terminal is nullable"? [2]

Students tend to submit solutions like: "A is nullable if it has a production rule like A = a | e ", or a similar restricted example. But it's better to say, more simply

A is nullable if A Þ^* e

as the production rules for A may not have any explicit e in them, but possibly specify a list of alternative sentential forms from which it is eventually possible to derive an empty string - this might involve chasing through several steps before it becomes apparent!

In English - a non-terminal A is nullable if it is possible to apply productions to the sentential forms derived from A in such a way that a null sentence can be derived.

(b) In the grammar defined by the following productions, identify (giving reasons) the nullable non- terminals. [3]

            A = B "a" | C { D } .
            B = [ A "b" ] .
            C = B { "a" } .
            D = "a" B "c" .

Most people quickly see that B is obviously nullable. Since C can derive an empty string from B followed by an empty string from { "a" }, C is also nullable. Less obvious, perhaps, is that A is also nullable. as the rule A = C { D } can lead to an empty string as well.

6. Complete the following Cocol specification for a program that would recognize a list of pop songs of my era (your parents might have sung along to some of them too). Here is a sample of such a list (I don't remember the years of all of them, so one might leave the year out - and some were recorded several times): [10]

"Please, Please Me" (The Beatles) [1963 ] .
"I wanna hold your hand" (The Beatles) .
"She loves you, yeah, yeah, yeah" ( The Beatles ).
"All I have to do is dream" (Everly Brothers) , (Roy Orbison) [ 1959 ].
"All I have to do is dream" (Richard Chamberlain) [ 1963 ].
"Sheila" (Tommy Roe) [ 1962 ] .
"Young Girl" (Gary Puckett and the Union Gap) .
"You ain't nothing but a Hound Dog" (Elvis Presley) [ 1956 ] .

The trick here is to define appropriate character sets and tokens, then it all becomes very easy:

   COMPILER Nostalgia

     CHARACTERS
       Control  = CHR(0) .. CHR(31) .
       InTitle  = ANY - Control - '"' .
       InArtist = ANY - Control - "()" .
       Digits   = "0123456789" .
     TOKENS
       Title  = '"' InTitle { InTitle } '"' .
       Artist = "(" InArtist { InArtist } ")" .
       Year   = Digit Digit Digit Digit .

     IGNORE Control

     PRODUCTIONS
       Nostalgia = { PopTune } EOF.
       PopTune   = Title Artist { "," Artist } [ "[" Year "]" ] .
     END Nostalgia /* they don't write them like they used to do! */.

There are, of course, other ways to try to do this. The above solution shows how, since the tokens are clearly demarcated by quotes at either end or by parentheses at either end, it makes some sense to treat Title and Artist as simple tokens - which, unusually for these sorts of grammars, can contain embedded spaces - as well as all sorts od other printable characters, perhaps, like single quotes, digits, hyphens and the like.

Some people miss the point that a year would have exactly four digits, and was best specified as above. The examples showed a variety of ways of adding the year, with suggestive spaces - for example [1963 ] and [ 1962 ] - so that it is preferable to leave the Year token as the digit part, and add the enclosing [ ] brackets as shown above, rather than defining, for example

     TOKENS
       Year   = "[" Digit Digit Digit Digit "]" .

Of course this allows years like 0001 or 9999 which, you might argue, are unrealistic. How could you improve on this?

Some submissions suggested a grammar with productions like

     PRODUCTIONS
       Nostalgia = { PopTune } EOF.
       PopTune   = Title Artist [ "[" Year "]" , { "[" Year "]" } ] .
     END Nostalgia /* they don't write them like they used to do! */.

but the examples suggested that a tune could be recorded by several artists on one year, rather than by one artist in several years. Come to think of it, a better way of recording tunes would be to record the artist along with the year, leading to a grammar like

     PRODUCTIONS
       Nostalgia   = { PopTune } EOF.
       PopTune     = Title Recording { "," Recording } .
       Recording   = Artist [ "[" Year "]" ] .
     END Nostalgia /* they don't write them like they used to do! */.

Would this grammar also describe the sample list of tunes above? If not, where not?