Computer Science 301 - 2005

Test on Prac 22 - Week beginning 26 September 2005

Please answer all questions in the spaces provided (three sides). Text books may not be used. Max 25

This test is intended to check your ability to wite simple Coco/R specifications and your understanding of LL(1) rules. It should only take you about 25 minutes at the most. Textbooks may not be used.

1. Oh dear. I need reminding yet again. What are your name (surname) and student number? [1]

Pat Terry 663T0844

2. Complete a Cocol specification - the CHARACTERS and TOKENS sections only for a list of the names of people. Here is an example of such a list:

      Patrick David Terry.
      John-John Cecil Montmorency Stuart-Watkins.
      Britney Spears.  Madonna.
      Paddy O'Toole.
      Charles Philip Arthur George Windsor.
      Camilla Parker-Bowles.  Diana Spencer.

Assume that individual names will start with an upper case letter, and that some names might have hyphens or apostrophes within them (but not at the beginning or end). [7]

The trick here is to note that the surnames all end with a period (fullstop), so that we can write distinctive tokens for first names and surnames:

    COMPILER NameList
    /* A grammar to describe a simple list of people's names
       P.D. Terry, Rhodes University, 2005 */
    CHARACTERS
      control  = CHR(0) .. CHR(31) .
      ULetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" .
      LLetters = "abcdefghijklmnopqrstuvwxyz" .
    TOKENS
      FirstName = ULetters { LLetters | "'" ULetters | "-" ULetters } .
      Surname   = ULetters { LLetters | "'" ULetters | "-" ULetters } "." .
    IGNORE control
    PRODUCTIONS
      NameList = { OneName } .
      OneName = { FirstName } Surname .
    END NameList.

Actually it is unlikely that a name with a Hyphen or Apostrophe would have only one letter after it, or that any name would have fewer than 3 letters, although some Asian names are only two letters long. So better tokens might be

TOKENS FirstName = ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters . Surname = ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters "." .

we have assumed that all letters "within" a name are lower-case. This is not always the case. Attendees of Highland gatherings often have surnames like MacGregor or McKay or MacHine, and such names might even be used as internal names, such as James MacPherson Farquharson Colquhoun. We could invent tokens for these:

TOKENS FirstName = ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters | ( "Mac" | "Mc" ) ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters . Surname = ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters "." | ( "Mac" | "Mc" ) ULetters { LLetters | "'" ULetters | "-" ULetters } LLetters "." .

3. You will recall Prac 20 where you had to write simple PVM assembler code. Here is an example of the sort of code you might have written:

       0   DSP    2
       2   LDA    0          ; push address of X
       4   LDC    -1         ; push constant -1
           STO               ;        X := -1
       7   LDA    1          ; push address of Y
       9   LDA    0          ; push address of X
           LDV               ; dereference - value of X (-1) is now on top of stack
      12   STO               ;        Y := X
           PRNS   "finished"
           HLT

in which we draw attention to the following:

(a) only one statement appears on a line
(b) statements may have an optional number at the start
(c) some opcodes (like LDC) take an integer argument, others (like STO) do not.
(d) a line may include a comment
(e) the lines in text files are separated by a CR LF sequence. (CR = CHR(13), LF = CHR(10))

Complete an LL(1) compliant Cocol specification of this language. Limit your description to the opcodes shown above [12]

There are several ways of doing this. Here is a solution acceptable for this test. The trick here is that we cannot ignore the end-of-line.

    COMPILER PVMAsm
    /* Grammar for subset of PVM assembler language
       P.D. Terry, Rhodes University, 2005 */
    CHARACTERS
      control     = CHR(0) .. CHR(31) .
      Printable   = ANY - control .
      InString    = Printable - '"' .
      Digits      = "0123456789" .
      LF          = CHR(10) .
      CR          = CHR(13) .
    TOKENS
      Number      = [ "-" ] Digits { Digits } .
      String      = '"' { InString } '"' .
      EOL         = LF .
      Comment     = ";" { Printable } CR .
    IGNORE control - LF
    PRODUCTIONS
      PVMAsm      = { Statement } .
      Statement   = [ Number ] Instruction [ Comment ] EOL .
      Instruction = TwoWord | OneWord | PrintString .
      TwoWord     = ( "LDA" | "LDC" | "DSP" ) Number .
      OneWord     = "STO" | "LDV" | "HLT" .
      PrintString = "PRNS" String .
    END PVMAsm.

A better definition for String might incorporate "escape sequences", and one might be tempted to use the COMMENTS feature of Coco/R. Here's what we might come up with:

COMPILER PVMAsm /* Grammar for subset of PVM assembler language P.D. Terry, Rhodes University, 2005 */ CHARACTERS control = CHR(0) .. CHR(31) . Backslash = CHR(92) . Printable = ANY - control . InString = Printable - Backslash - '"' . Digits = "0123456789" . LF = CHR(10) . CR = CHR(13) . TOKENS Number = [ "-" ] Digits { Digits } . String = '"' { InString | Backslash Printable } '"' . EOL = LF . COMMENTS FROM ";" TO CR IGNORE control - LF PRODUCTIONS PVMAsm = { Statement } . Statement = [ Number ] Instruction EOL . Instruction = TwoWord | OneWord | PrintString . TwoWord = ( "LDA" | "LDC" | "DSP" ) Number . OneWord = "STO" | "LDV" | "HLT" . PrintString = "PRNS" String . END PVMAsm.

4. Consider the following grammar expressed in EBNF for describing the progress of an excellent compiler course offered at a University Near You (careful, it is not the same grammar as you saw last week):

Course = Introduction Section { Section } Conclusion .
Introduction = [ "lecture" "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

(a) Is this an LL(1) grammar? If not, where does it break the rules (justify your argument, do not simply guess)? [3]

Most people could see that this was not an LL(1) grammar. Introduction is nullable - although many people neglected to say this - and

FIRST(Introduction) = { "lecture" },

while

FOLLOW(Introduction) = { "lecture" , "practical", "tutorial" , "handout" , "test" }.

Note that there is no problem with the nullable component of Section, or with the nullable component within? Conclusion. Some students said that because a Section could start with "practical" "test" and be followed by "test" that the grammar was non LL(1), but this is not true. Be careful where and how you test Rule 2.

(b) If it breaks the rules, do you suppose a recursive descent parser would have trouble recognizing the history of a course? Explain. [2]

A point I wish to stress here is that a recursive descent parser can handle some non-LL(1) grammars sensibly. The most obvious example of this is the "dangling else" grammar (go and read that part of the text again!). In this particular case, however, there would be trouble. If the string to be parsed began with "lecture" the parser would not know which route to follow unless it could look ahead to the next token - and even this would not help if the next token were "handout".

To take this further, contrast this with the similar grammar seen last time:

Course = Introduction Section { Section } Conclusion .
Introduction = "lecture" [ "handout" ] .
Section = { "lecture" | "practical" "test" | "tutorial" | "handout" } "test" .
Conclusion = [ "panic" ] "examination" .

This is also non-LL(1), as the nullable component within Introduction has the FIRST set { "handout" } which has "handout" in common with the FOLLOW set. However a parser would have no trouble parsing a string like

"lecture" "handout" "lecture" ...

as it would simply associate the handout with an Introduction, and not with a following Section (this is completely analogous with the dangling else problem, of course).

(c) Do you think it is possible to complete the course described by this grammar without attending any lectures? (Explain your reasoning.) [2]

Yes, indeed. A string like

"test" "examination"

is the minimum one that is acceptable to the parser, one which would result if all the nullable options were exercised.

(d) (Bonus mark) So why are some of you missing some of my lectures? [1]

There is no agreement on the correct solution here. Probably the closest is that some people are just too lazy to attend, or have stayed up too late and over sleep. Crazy, really? I'm willing to bet that if you pay for a drink at the Rat you would drink it; since you've paid for a whole lot of drinks at the Fountain of Knowledge, why don't you drink those too? Those who submitted no solution but who are known always to attend earned the bonus mark anyway. It is very rarely that I award marks for leaving an answer blank. It is not likely to happen again!