P Test on week 21

Computer Science 3 - 2006 - Test on Week 21

Please answer all questions in the spaces provided (three sides). Text books may not be used. Max 35.

This test is intended to check your understanding of important concepts in formal language theory and the construction of grammars, with some questions relating back to earlier sections. It should probably take you about half an hour.

1. You must have swotted for hours for this hot spot! What are your name and student number? [1]

Pat Terry 63T0884

2. A language L restricted by a grammar G is defined by L(G) = {N, T, S, P}.

(a) What do you understand by the terms Sentential Form and Sentence? [2]

A sentential form is the goal or start symbol, or any string that can be derived from it:

Sentential form = s | S Þ^* s Ù s Î (N È T )^*

A sentence is a string derivable from S that contains only terminals.

Sentence = w | S Þ^* w Ù w Î T ^*

(b) What do you understand by the terms Terminal and Non-Terminal? [2]

Terminals are the atomic (indivisible) elements of a language that appear in sentences. Non- terminals are useful descriptors of syntactic elements that are useful in describing the "phrase structure" of sentences - ideas like "function call" or "variable declaration" if one is talking about computer languages, or "qualified noun" or "dependent clauses" if one is talking about English sentences.

In terms of L(G) we simply have that a is a terminal if a Î T and A is a non-terminal if A Î N.

3. (a) What is meant by the statement "a grammar is ambiguous"? [2]

There is a very simple way to express this - a grammar is ambiguous if at least one sentence in the language that it defines can be parsed in at least two ways (that is can have more than one parse tree rooted in the goal symbol and with identical leaf nodes). Note that we are talking about a single grammar - we are not talking about two equivalent but distinct grammars that might each be able to describe the same sentences.

(b) Demonstrate that the following grammar that describes phrases like "lucky girl" or "poor silly twisted boy" is ambiguous. [4]

            Phrase     = Adjectives Noun.
            Adjectives =   Adjectives Adjectives
                         | "silly" | "twisted" | "lucky" | "clever" | "poor" | e .
            Noun       = "boy" | "girl" .

To answer this sort of question you must be able to draw two parse trees (or specify two distinct expansions from the goal symbol in some other way) that lead to the same sentence. There are many examples one can use here, and most people could come up with one. The real menace here is that the option

Adjectives = Adjectives Adjectives

can lead to another string like

Adjectives = Adjectives Adjectives Adjectives

in two ways - either by applying the option again to the leftmost Adjectives or to the rightmost Adjectives. So here is an example where we can draw two distinct trees:

Phrase Phrase .-------------------. .-------------------------. | | | | Adjectives Noun Adjectives Noun .------------------. | .----------------. | | | | | | | Adjectives Adjectives | Adjectives Adjectives | .-----------. | | | .-----------. | | | | | | | | | Adjectives Adjectives | | | Adjectives Adjectives | | | | | | | | | "poor" "silly" "twisted" "boy" "poor" "silly" "twisted" "boy"

This is really very easy! One solution is:

            Phrase     = { Adjectives } Noun.
            Adjectives = "silly" | "twisted" | "lucky" | "clever" | "poor" .
            Noun       = "boy" | "girl" .

Be careful not to write, as many people did

Phrase = { Adjectives } Noun. Adjectives = "silly" | "twisted" | "lucky" | "clever" | "poor" | e. Noun = "boy" | "girl" .

which is actually ambiguous once more. Curly braces around { Adjectives } already implies an e option, and you don't need/want to have two ways of doing this!
Another grammar might be a recursive one on the lines of one suggested in the book somewhere:

Phrase = Adjective Phrase | Noun. Adjective = "silly" | "twisted" | "lucky" | "clever" | "poor" . Noun = "boy" | "girl" .

4. The contents of a useful textbook (if you can find a copy) begins as follows:

    All you need to know to be able to pass the compiler course.

                         by

                     Pat Terry.

    Chapter 1  Bribery is unlikely to succeed.

    Chapter 2  Understand the phases of compilation.
        2.1    Lexical and syntactic analysis are easily confused
        2.2    Constraint analysis involves the concept of type
        2.3    Code generation for the PVM is a breeze

    Chapter 3  Get clued up on grammars.
        3.1    Terminals
        3.2    Sentences and sentential forms
        3.3    Productions
        3.4    EBNF and Cocol
        3.5    Ambiguity is bad news

and so on. Assume that the words are all simple, but notice that full stops (periods) appear in significant places. Since this is a long course, there are many more than three chapters, and some of them have many sub-sections.

Complete the following Cocol description of such a contents section. [10]

Here is the sort of thing I was hoping to see. We can recognise phrase structure concepts like Title, Author, Chapter heading, Subsection headings and so on, so introduce non-terminals for these into the PRODUCTIONS section. We can also recognise lexical concepts like word and number - and, in fact, two kinds of numbers, so introduce these into the TOKENS section based on pretty obvious CHARACTER sets. Finally, there are some candidates for "key words" - terminals like "Chapter" and "by". Of course these last words might appear in the "wrong" places quite legitimately

         3.1  Revisit the key concepts in Chapter 2.4 one by one

but those are the sort of complications that are not in the "spirit" of this silly example, and I did not expect students to worry too much about them - please accept that these sorts of exercises are a bit artificial!

COMPILER HotTips $NC CHARACTERS uLetter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" . lLetter = "abcdefghijklmnopqrstuvwxyz" . letter = uLetter + lLetter . digit = "0123456780" . TOKENS word = letter { letter } . /* any letter will do, but there must be at least one */ number = digit { digit } . /* for numbering chapters */ section = digit { digit } "." digit { digit } . /* for subsections */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS HotTips = Title "by" Author { Chapter } . Title = word { word } "." . Author = word { word } "." . Chapter = "Chapter" number Title { Subsection } . Subsection = section word { word } . /* notice no final "." */

There are, obviously, other ways of doing this. For example, since the combination word { word } comes up so often one might use

PRODUCTIONS HotTips = Sentence "by" Sentence { Chapter } . Sentence = Words "." . Words = word { word } . Chapter = "Chapter" number Sentence { Subsection } . Subsection = section Words . /* notice no final "." */ END HotTips.

or

PRODUCTIONS HotTips = Words "." "by" Words "." { Chapter } . Chapter = "Chapter" number Sentence { section Words } . Words = word { word } . END HotTips.

One might want to insist that the first word in a title, section heading etc had to start with an uppercase letter. This gets a bit messier, but perhaps something like this comes to mind:

COMPILER HotTips $NC CHARACTERS uLetter = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" . lLetter = "abcdefghijklmnopqrstuvwxyz" . letter = uLetter + lLetter . digit = "0123456780" . TOKENS uWord = uLetter { letter } . /* must start with an upper case letter */ lWord = lLetter { letter } . /* must start with a lower case letter */ number = digit { digit } . /* for numbering chapters */ section = digit { digit } "." digit { digit } . /* for subsections */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS HotTips = Sentence "by" Sentence { Chapter } . Sentence = Words "." . Words = uWord { uWord | lWord } . Chapter = "Chapter" number Sentence { Subsection } . Subsection = section Words . /* notice no final "." */ END HotTips.

Two frequent mistakes at this stage of your development are (a) to use non-terminals where tokens would be better:

Subsection = number "." number Words . /* don't do this - 3.4 is one token! */

and (b) to add punctuation like full stops into the token definition for words or numbers in the wrong way

TOKENS /* if necessary or useful */ word = letter { letter | "." } . /* would allow hello.word as one word */ number = digit { digit | "." } . /* would allow 34...56 as one number */

5. A grammar for expressions like the one you extended in the practical should be familiar

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = "a" | "b" | "c" | "(" Expression ")" .

Suppose we also wanted our grammar to be able to describe expressions in which entities could be raised to a power - an operation denoted by "^", and exemplified by a ^ b or (a + b) ^ c or a ^ b ^ c * d .

The exponentiation operator (as "^" is called) is a high precedence operator and associates from right to left so that a ^ b ^ c means a ^ (b ^ c) and not (a ^ b) ^ c. Suggest how this operator would be incorporated in an extension of the grammar above. [5]

The hint was given that this is a high precedence operator. It's higher than * and /, but lower than (), and the correct solution is to introduce another level into the hierarchy. To get the correct "right associativity" requires an option in a right recursive production

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = Primary [ "^" Factor ] .
             Primary    = "a" | "b" | "c" | "(" Expression ")" .

Now as an interesting exercise you might like to ponder why you cannot write, instead

Expression = Term { "+" Term | "-" Term } . Term = Factor { "*" Factor | "/" Factor } . Factor = Primary [ "^" Expression ] . Primary = "a" | "b" | "c" | "(" Expression ")" .

and you might also like to ponder whether the following alternative is acceptable:

Expression = Term { "+" Term | "-" Term } . Term = Factor { "*" Factor | "/" Factor } . Factor = { Primary "^" } Primary . Primary = "a" | "b" | "c" | "(" Expression ")" .

6. (a) What is meant by the statement "a non-terminal is nullable"? [2]

Students tend to submit solutions like: "A is nullable if it has a production rule like A = a | e ", or a similar restricted example. But it's better to say, more simply

A is nullable if A Þ^* e

as the production rules for A may not have any explicit e in them, but possibly specify a list of alternative sentential forms from which it is eventually possible to derive an empty string - this might involve chasing through several steps before it becomes apparent!

In English - a non-terminal A is nullable if it is possible to apply productions to the sentential forms derived from A in such a way that a null sentence can be derived.

(b) In the grammar defined by the following productions, identify (giving reasons) the nullable non- terminals. [3]

            A = B "a" | C { D } .
            B = [ A "b" ] .
            C = B { "a" } .
            D = "a" B "c" .

Most people quickly see that B is obviously nullable. Since C can derive an empty string from B followed by an empty string from { "a" }, C is also nullable. Less obvious, perhaps, is that A is also nullable. as the rule A = C { D } can lead to an empty string as well.