Test for week 3

Computer Science 3 - 2017 - Test on Week 3 - Solutions

Please answer all questions in the spaces provided (three sides). Text books may not be used. Max 40

This test is intended to check your understanding of important concepts in formal language theory and the construction of grammars, with some questions relating back to earlier sections. It should probably take you about half an hour.

1. Who says it's hard to spot for my tests? What are your student number and surname? [1]

I'm a kindly soul. Bonus mark if you can remember mine. [1]

63T0884 Terry Be careful to answer the questions accurately ...

2. A language L restricted by a grammar G is defined by L(G) = {N, T, S, P}.

(a) What do you understand by the terms Sentential Form and Sentence? [3]

A sentential form is the goal or start symbol, or any string that can be derived from it:

Sentential form = σ | S ⇒^* σ ∧ σ ∈ (N ∪ T )^*

A sentence is a string derivable from S that contains only terminals.

Sentence = w | S ⇒^* w ∧ w ∈ T ^*

(b) What do you understand by the terms Terminal and Non-Terminal? [3]

Terminals are the atomic (indivisible) elements of a language that appear in sentences. Non- terminals are useful descriptors of syntactic elements that are useful in describing the "phrase structure" of sentences - ideas like "function call" or "variable declaration" if one is talking about computer languages, or "qualified noun" or "dependent clauses" if one is talking about English sentences. Non-terminals and terminals may typically appear in the interior nodes of parse trees, but leaf nodes of such trees will each contain a single terminal.

In terms of L(G) we simply have that a is a terminal if a ∈ T and A is a non-terminal if A ∈ N.

3. (a) What is meant by the statement "a grammar is ambiguous"? [2]

There is a very simple way to express this - a grammar is ambiguous if at least one sentence in the language that it defines can be parsed in at least two ways (that is can have more than one parse tree rooted in the goal symbol and with identical leaf nodes). Note that we are talking about a single grammar - we are not talking about two equivalent but distinct grammars that might each be able to describe the same sentences.

(b) The following grammar describes phrases like "wild cat" or "gentle tame obedient dog". Show that it is ambiguous. Justify your thinking, don't waffle! [4]

            Phrase     = Adjectives Noun.
            Adjectives =   Adjectives Adjectives
                         | "gentle" | "wild" | "hungry" | "fierce" | "tame" | "obedient" | ε .
            Noun       = "dog" | "cat" .

To answer this sort of question you must be able to draw two parse trees (or specify two distinct expansions from the goal symbol in some other way) that lead to the same sentence. There are many examples one can use here, and most people could come up with one. The real menace is not the ε as such (though that does not help) - it is that the option

Adjectives = Adjectives Adjectives

can expand to another piece of sentential form like

Adjectives = Adjectives Adjectives Adjectives

in two ways - either by applying the option again to the leftmost Adjectives or to the rightmost Adjectives. So here is an example where we can draw two distinct trees (without cluttering it up with ε, although examples with an ε are acceptable too):

                                      Phrase                                     Phrase
                             ┌───────────────────┐                ┌─────────────────────────┐
                             │                   │                │                         │
                        Adjectives              Noun          Adjectives                  Noun
                    ┌──────────────────┐         │         ┌────────────────┐               │
                    │                  │         │         │                │               │
                Adjectives          Adjectives   │      Adjectives      Adjectives          │
               ┌───────────┐           │         │         │          ┌───────────┐         │
               │           │           │         │         │          │           │         │
           Adjectives  Adjectives      │         │         │      Adjectives  Adjectives    │
               │           │           │         │         │          │           │         │

            "gentle"    "tame"    "obedient"   "dog"    "gentle"   "tame"    "obedient"   "dog"

This is really very easy! One solution is as follows - and because it is LL(1) it can't be ambiguous:

            Phrase     = { Adjectives } Noun.
            Adjectives = "gentle" | "wild" | "hungry" | "fierce" | "tame" | "obedient" .
            Noun       = "dog" | "cat" .

But be careful not to write, as you might be tempted to do

            Phrase     = { Adjectives } Noun.
            Adjectives = "gentle" | "wild" | "hungry" | "fierce" | "tame" | "obedient" | ε.
            Noun       = "dog" | "cat" .

which is actually ambiguous once more. Curly braces around { Adjectives } already imply an ε option, and you don't need/want to have two ambiguous ways of doing this!

Another grammar might be a right recursive LL(1) one:

            Phrase     = Adjective Phrase | Noun.
            Adjective  = "gentle" | "wild" | "hungry" | "fierce" | "tame" | "obedient" .
            Noun       = "dog" | "cat" .

4. A grammar for expressions a bit like the one you extended in the practical should be familiar

             Expression = [ "+" | "-" ] Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = "a" | "b" | "c" | "(" Expression ")" .

And here is another variation: (I warned you that you are surrounded by evil expression grammars!)

             Expression = Term { "+" Term | "-" Term } .
             Term       = Factor { "*" Factor | "/" Factor } .
             Factor     = [ "+" | "-" ] ( "a" | "b" | "c" | "(" Expression ")" ) .

Study them carefully. Both grammars claim to be able to derive the simple expression - a + b * c. Is that claim correct? (draw parse diagrams to justify your answer). [4]

                     Expression                           Expression
                         │                                    │
           ┌─────┬───────┼─────────┐                   ┌──────┼──────────┐
           │     │       │         │                   │      │          │
           │    Term     │        Term                Term    │         Term
           │     │       │         │                   │      │          │
           │     │       │   ┌─────┼─────┐             │      │    ┌─────┼─────┐
           │     │       │   │     │     │             │      │    │     │     │
           │   Factor    │ Factor  │   Factor        Factor   │  Factor  │   Factor
           │     │       │   │     │     │             │      │    │     │     │
           │     │       │   │     │     │        ┌────┤      │    │     │     │
           │     │       │   │     │     │        │    │      │    │     │     │
           -     a       +   b     *     c        -    a      +    b     *     c

Do the grammars justify being called "equivalent"? If you don't think so, illustrate by example. [4]

They are not equivalent. The second grammar can derive an expression like a + b * - c which the first one cannot. While there might be many expressions that they can both derive, just one rogue is enough to disprove the hypothesis. Can you think up any others? Should an expression like that just suggested have a meaning? And now that you are getting the hang of it, what about expressions like

-a + -b + + - c * / (-a - +b)

5. (a) What is meant by the statement "a non-terminal is nullable"? [2]

Students tend to submit solutions like: "A is nullable if it has a production rule like A = a | ε ", or a similar restricted example. But it's better to say, more simply

A is nullable if A ⇒^* ε

as the production rules for A may not have any explicit ε in them, but possibly specify a list of alternative sentential forms from one or more of which it is eventually possible to derive an empty string - this might involve chasing through several steps before it becomes apparent!

In English - a non-terminal A is nullable if it is possible to apply productions to the sentential forms derived from A in such a way that a null sentence can be derived.

(b) In the grammar defined by the following productions, identify (giving reasons) the nullable non- terminals. [3]

            A = B "a" | C { D } .
            B = [ A "b" ] .
            C = B { "a" } .
            D = "a" B "c" .

Most people quickly see that B is obviously nullable. Since C can derive an empty string from B followed by an empty string from { "a" }, C is also nullable. Less obvious, perhaps, is that A is also nullable. as the rule A = C { D } can also lead to an empty string.

Note that a production like A = [ B ] { D } does not mean that B and D are nullable - it means that A is nullable. We can speak of [ B ] and { D } as "nullable sections" however.

6. Here is the original grammar as supplied in the last prac, before you started playing with it:

    COMPILER Calc  $CN
    /* Simple four function calculator
       P.D. Terry, Rhodes University, 2017 */

    CHARACTERS
      digit      = "0123456789" .
      hexdigit   = digit + "ABCDEF" .

    TOKENS
      decNumber  = digit { digit } .
      hexNumber  = "$" hexdigit { hexdigit } .

    IGNORE CHR(0) .. CHR(31)

    PRODUCTIONS
      Calc       = { Expression "=" } EOF .
      Expression = Term { "+" Term  |  "-" Term } .
      Term       = Factor { "*" Factor |  "/" Factor } .
      Factor     = decNumber | hexNumber .
    END Calc.

(a) Suppose you wanted the calculator to recognize numbers like 99.87, 97. and .3456 with decimal digits on either or both sides of the "point". What would you have to add to the grammar? Hint "what would you have to add" is Pat Terry-speak for "make actual code changes to the grammar, don't write a long tedious essay!") [4]

The trap that is easy to fall into is to write down a TOKEN definition that permits a number to have nothing but a point in it, that is, no digits at all. Even worse, to have a floatNumber with only digits - but that is what is needed for decNumber. A correct answer might be

    TOKENS
      decNumber   = digit { digit } .
      hexNumber   = "$" hexdigit { hexdigit } .
 *    floatNumber =   digit { digit } "." { digit }        // at least one digit before the point
 *                  | "." digit { digit } .                // no digits before the point

    PRODUCTIONS
      Calc        = { Expression "=" } EOF .
      Expression  = Term { "+" Term  |  "-" Term } .
      Term        = Factor { "*" Factor |  "/" Factor } .
 *    Factor      = decNumber | hexNumber | floatNumber .  // must add to this production too
    END Calc.

(b) Someone with an imagination, dreaming up an example like ( ( ( 4 + 4 * 6 ) ) ) as a valid expression for testing deep nesting of expressions, has suggested that the production for Factor could be modified to read

      Factor     = decNumber | hexNumber  | { "(" } Expression { ")" } .

What is your reaction? Explain by giving examples, preferably. [3]

This is another common trap. That production for Factor has no way of ensuring that the parentheses "balance". It is quite sufficient to have

      Factor     = decNumber | hexNumber  | "(" Expression ")" .

which (recursively) already allows parenthesized expressions to be nested within expressions.

For practice, check whether the incorrect grammar could be deemed to be LL(1).

(b) Someone else with an imagination, dreaming up an example like 5 * abs(3 - 9) as a valid expression for testing the abs function, has suggested that the production for Factor could be modified to read

      Factor     = decNumber | hexNumber  | "abs(" | "(" Expression ")"  .

What is your reaction? Explain by giving examples, preferably. [3]

This is another horror. For starters, the precedence of the operator | is such that the production is equivalent to the following:

      Factor     =   decNumber | hexNumber
                   | (   "abs("    )
                   | (    "(" Expression ")"    )
                   .

Secondly, defining a terminal like "abs(" is not good practice. The abs is the name of a function, and the left parenthesis is conceptually part of an envelope around a parameter list. You might want to separate abs and ( with spaces in your source code, for increased clarity. What the poor imaginative student came up with would have been better coded in one of the following ways:

      Factor     =     decNumber | hexNumber
                    | "(" Expression ")"
                    | "abs" "(" Expression ")"  .

      Factor     =     decNumber | hexNumber
                    | [ "abs" ] "(" Expression ")" .