Computer Science 301 - 2008


Test on Prac 24 - Week beginning 6 October 2008 - Solutions

This test is intended to check your ability to write attributed Cocol grammars. It should only take you about 15 minutes at the most. Textbooks may be used.

1. Tell me often enough, and I will believe anything. What are your surname (and initials)?

Terry P.D.

2. You thought you had heard the last of those bagpipes, didn't you? Wrong again! The grammar below describes a medley event at a Highland Gathering:

      COMPILER Event $CN
      /* Describe simple pipe band competition event */

      IGNORECASE

      CHARACTERS
        letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".

      TOKENS
        NameOfBand = letter { letter } .

      IGNORE CHR(0) .. CHR(31)

      PRODUCTIONS
        Event   = Band { Band } .
        Band    = NameOfBand Medley .
        Medley  = OneTune { OneTune } "break" .
        OneTune =   "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel"
                  | "Strathspey" { "Strathspey" } "Reel" .

      END Event.
The organisers have decided that this year each band must play at least two jigs.

What would you have to add to this grammar so that the parser could disqualify any band that did not play at least two jigs and would also tell you (a) the total number of bands that had competed in the event (b) the name of the band that played the medley with the greatest number of tunes.

This was quite well attempted, although there are a few students who still have little clue about adding actions to a grammar, or, worse still, any clue about how and why one would perform the additions and checks needed to solve what is really a very simple problem. Typical points missed were as follows (please look at all of these, even if you did get a partially correct solution):

(a) In a non-recursive problem as simple as this one, the easiest way to ensure that all productions have access to all the variables is to declare them as static fields of the parser. Java does not have proper reference parameter passing (in this respect it is quite awful as a language) so you can either pass values "down" or "up", but not in both directions (which you would really need to do here) unless you pass references to specially created objects of an appropriate class, and that is really overkill for this problem. If you don't know what I am talking about, then that is even more worrying, and you should come to seek clarification. You should have mastered those aspects of Java programming in first year!

(b) You have to count the total number of tunes and the total number of jigs separately, and these counts have to be reset to zero for each band.

(c) The OneTune production has a slightly deceptive name. A sequence of strathspeys and the mandatory reel contribute two or more tunes to the total. Nearly everyone missed this. piping lessons for you all next week?

(d) NameOfBand is the name of a terminal and is not (and cannot be) followed by <parameters>. It has to be followed immediately by an action to save the value of token.val. Every time you scan for a new token the value of token.val is refreshed - you cannot hope that it will magically stay the same until it is needed later.

(e) The semantic check on the number of jigs can only be applied after the medley is complete. The check for the longest medley can only be applied after the medley is complete. Surely that was obvious, yet some solutions tried applying the check as soon as a jig was detected.

(f) There was no need to use string variables to save strings like "March". You only had to count the tunes!

(g) Don't be tempted to play around with comparisons like if (sym.kind = SYM.BandSym). They require too deep a knowledge of the inner workings of Coco/R and sooner or later will lead you astray.

(h) Most important of all - you cannot just write down some (. random actions .) at arbitrary points in the grammar. Their positioning is crucially important. You were warned about this in lectures, after all!

  import Library.*;  /* so that we have access to the IO library */

  COMPILER Event $CN
  /* Describe simple pipe band competition event.  This simple problem is best handled with
     some simple static (global) fields, rather than complex parameter passing */

    static int bandCount = 0, longestCount = 0, tuneCount, jigCount;
    static String longestBand, nextBand;

  IGNORECASE

  CHARACTERS
    letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_".

  TOKENS
    NameOfBand = letter { letter } .

  IGNORE CHR(0) .. CHR(31)

  PRODUCTIONS
    Event
    = Band { Band }             (. IO.writeLine(bandCount + " bands competed");
                                   IO.writeLine("Longest medley played by " + longestBand); .)
    .

    Band                        (. bandCount++; .)
    = NameOfBand                (. nextBand = token.val; // must save it here for later .)
      Medley                    (. if (jigCount < 2)
                                     SemError(nextBand + " disqualified - too few jigs");
                                   if (tuneCount > longestCount) {
                                     longestBand = nextBand;
                                     longestCount = tuneCount;
                                   } .)
    .

    Medley                      (. jigCount = 0; tuneCount = 0; .)
    = OneTune { OneTune }
      "break" .

    OneTune                     (. tuneCount ++; // at least one tune played here .)
    =   "March"
      | "SlowMarch"
      | "Jig"                   (. jigCount++;   // count these separately .)
      | "Hornpipe"
      | "Reel"
      | "Strathspey"
        { "Strathspey"          (. tuneCount++; // we can get several tunes here .)
        } "Reel"                (. tuneCount++; // and one more tune here too .)
    .

  END Event.

3. The Cocol grammar below (RE.ATG) describes a sequence of Regular Expressions (written one to a line) using the conventions discussed during the course.

      COMPILER RE   /* Regular expression grammar */

      CHARACTERS
        lf         = CHR(10) .
        control    = CHR(0) .. CHR(31) .
        noquote1   = ANY - control - "'" .
        noquote2   = ANY - control - '"' .
        meta       = "()*|[]-?+" .
        simple     = ANY - control - "'" - '"' - meta .

      TOKENS
        atomic     = simple .
        escaped    = "'" noquote1 "'" | '"' noquote2 '"' .
        EOL        = lf .

      IGNORE  CHR(0) .. CHR(9) + CHR(11) .. CHR(31)

      PRODUCTIONS
        RE         = { Expression EOL } EOF .
        Expression = Term { "|" Term } .
        Term       = Factor { Factor } .
        Factor     = Element [ "*" | "?" | "+" ] .
        Element    = Atom | Range | "(" Expression ")" .
        Range      = "[" OneRange { OneRange } "]" .
        OneRange   = Atom [ "-" Atom ] .
        Atom       = atomic | escaped .
      END RE.

How would you add appropriate actions and attributes so that you could generate a program that would parse a sequence of regular expressions and report on the alphabets used in each one. For example, given input like

            a | b c d | ( x y z )*
            [a-g A-G] [x - z]?
            a? "'" z+

the output should be something like

            Alphabet = a b c d x y z
            Alphabet = A B C D E F G a b c d e f g x y z
            Alphabet = ' a z

This was not very well done, unfortunately. Very few people seemed to realise that one only needs to add a character to the alphabet the first time it is detected, and many peopl set up complex arraylist structures of strings, or of characters, or manipulated strings and stringbuilder objects in tedious ways. The recommended way to solve this sort of problem is to use a set (an alphabet is a set of symbols, the Good Book wuill tell you on page 86) or, if you don't want to do that, use an array of booleans, indexed by the character values, where setting an element to true denotes membership of that character. Something like the code you see below where, once again, a simple global static structure will suffice since the grammar is non-recursive. Incidentally, very few people handled the OneRange actions anywhere near correctly, meaning that they made no attempt to incorporate all the characters in a range like [a-g] (meaning abcdefg).

  import Library.*;  /* so that we have access to the IO library */

  COMPILER RE $CN
  /* Regular expression grammar */

  static boolean[] alphabet = new boolean[256];

  CHARACTERS
    lf       = CHR(10) .
    control  = CHR(0) .. CHR(31) .
    noquote1 = ANY - control - "'" .
    noquote2 = ANY - control - '"' .
    meta     = "()*|[]-?+" .
    simple   = ANY - control - "'" - '"' - meta .

  TOKENS
    atomic  = simple .
    escaped = "'" noquote1 "'" | '"' noquote2 '"' .
    EOL     = lf .

  IGNORE  CHR(1) .. CHR(9) + CHR(11) .. CHR(31)

  PRODUCTIONS
    RE                                (. int i; .)
    = {                               (. for (i = 1; i < 256; i++)
                                           alphabet[i] = false; .)
        Expression EOL                (. IO.write("Alphabet = ");
                                         for (i = 1; i < 256; i++)
                                           if (alphabet[i]) IO.write( (char) i);
                                         IO.writeLine(); .)
      } EOF .

    Expression
    = Term { "|" Term } .

    Term
    = Factor { Factor } .

    Factor
    = Element [ "*" | "?" | "+" ] .

    Element                           (. char ch; .)
    =   Atom<out ch>                  (. alphabet[ch] = true; .)
      | Range | "(" Expression ")" .

    Range
    = "[" OneRange { OneRange } "]" .

    OneRange                          (. char ch, ch1, ch2; .)
    = Atom<out ch1>                   (. alphabet[ch1] = true; .)
      [ "-" Atom<out ch2>             (. if (ch2 < ch1)
                                           SemError("invalid range");
                                         else
                                           for (ch = ch1; ch <= ch2; ch++)
                                             alphabet[ch] = true; .)
      ] .

    Atom<out char ch>                 (. ch = 0; .)
    = (   atomic                      (. ch = token.val.charAt(0); .)
        | escaped                     (. ch = token.val.charAt(1); .)
      )

    .

  END RE.


Home  © P.D. Terry