Tutorial for week 19 - more practice in Cocol grammars

1. Develop a Cocol specification for a program that would recognize the staff list in a Science Faculty. Assume that staff all have a title (Dr, Prof, Mr, Miss, Mrs or Ms), that they all have at least a surname (some might have names like Harley-Davidson), that some of them prefer to be known by their initials, some by their names, and some in combination, and that they possibly have degrees. Degrees are odd things. Scientists only have three possibilities: BSc, MSc and PhD. You can't get a PhD unless you have an MSc, and you can't get an MSc unless you have a BSc, and you always quote the degrees in the order you obtain them. The following would be a valid Science Faculty list of names:

Dr Nospots Onhym, BSc, MSc, PhD.
Mr I. Feildim-Orle.
Miss Fitt, BSc.
Mr P. D. Wossname, BSc, MSc.
Prof Heile Unlykelee.
Mrs Sheik M. Drye, BSc.
Ms Y. Ima Raver.

2. The following Cocol grammar describes a set of EBNF productions:

    COMPILER EBNF $XCN
      CHARACTERS
        letter   = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
        digit    = "0123456789" .
        noquote  = ANY - '"' .
      IGNORE  CHR(9) .. CHR(13)
      TOKENS
        nonterminal = letter { letter | digit } .
        terminal    =  '"' noquote { noquote } '"' .
      PRODUCTIONS
        EBNF       = { Production } EOF .
        Production = nonterminal "=" Expression "." .
        Expression = Term { "|" Term } .
        Term       = Factor { Factor } .
        Factor     =   nonterminal | terminal | "[" Expression "]"
                     | "(" Expression ")" | "{" Expression "}" .
    END EBNF.

The PRODUCTIONS section here makes use of the EBNF "meta brackets" { } in the definitions of EBNF, Expression and Term. How would you write the PRODUCTIONS section without making use either of the { } or [ ] form of meta brackets?

3. In section 5.9.5 it is shown how simple strings can be specified in Cocol

   CHARACTERS
     noquote1 = ANY - "'" - CHR(0) .
     noquote2 = ANY - '"' - CHR(0) .

   TOKENS
     string =   "'" noquote1 { noquote1 } "'"
              | '"' noquote2 { noquote2 } '"' .

and in section 8.7.2 it is shown how Clang/Pascal string literals can be specified (ones in which two successive apostrophes can be used as an "escape sequence" to denote a single apostrophe):

   CHARACTERS
     cr       = CHR(13) .
     lf       = CHR(10) .
     instring = ANY - "'" - cr - lf - CHR(0) .

   TOKENS
     string = "'" (instring | "''") { instring | "''" } "'" .

What are the CHARACTERS and TOKENS clauses that you would need to be able to specify the form of strings in C++ - strings with other escape sequences, as demonstrated by

"A bit of string with newline \n characters \t tab characters and \\ backslash characters"

4. Consider the by now rather hackneyed grammar for expressions which has been written out again below. How would you add attributes so that a parser, after recognising an Expression, would be able to determine whether that Expression was a "constant expression" (meaning that in principle it could be evaluated at compile time) or was a "variable expression" (meaning that it could not be evaluated until run time)? Assume that the Designator production can be attributed so that it returns the result of searching the symbol table to determine whether the associated identifier denotes a variable or a constant.

  Expression
  =
  (   "+" Term
    | "-" Term
    | Term
  ) { AddOp
      Term
     } .

  Term
  =  Factor
     { MulOp
       Factor
     } .

  Factor
  =
      Designator
    | number
    | "(" Expression
      ")" .


Home  © P.D. Terry