Prac 23 test

Computer Science 301 - 2003

Test on Prac 23 - Week beginning 22 September 2003 - Solutions

This test is intended to check your ability to write simple scanners and parsers.

1. You won't believe your luck - you can get this 100% right! What are your name (surname) and student number? [1]

Pat (Terry) 663T0844

2. Suppose we wanted to write a scanner to match the specification expressed in Cocol as follows (only two kinds of tokens are valid!)

       CHARACTERS
         digit = "0123456789" .
         eol   = CHR(10) .
       IGNORE CHR(1) .. CHR(32)
       COMMENTS FROM ";" TO eol
       TOKENS
         float   = digit { digit } '.' { digit } .
         integer = digit { digit } .

The routine must assign either floatSym or intSym to sym.kind before leaving the routine (or assign the value of noSym if it comes across an invalid token). Complete the skeleton code below:

      static void getSym() {  // Scans for next sym
        StringBuffer symLex = new StringBuffer();
        int symKind = noSym;
        while (ch > EOF && ch <= ' ') getChar();
        if (ch == ';') {                             // comment
          do getChar(); while (ch != '\n');
          getChar(); getSym();                       // now look for the symbol again
          return;
        }
        if (Character.isDigit(ch)) {                 // must be okay
          symKind = intSym;                          // initial assumption
          while (Character.isDigit(ch))
            symLex.append(ch); getChar();            // scan over the first few digits
          if (ch == '.') {                           // okay, we must have a floating number
            symKind = floatSym;                      // so change our assumption
            do                                       // now scan past the digits after the point
              symLex.append(ch); getChar();
            while (Character.isdigit(ch))
          }
        }
        else {
          symLex.append(ch);
          switch (ch) {
            case EOF:
              symLex = new StringBuffer("EOF");      // special representation
              symKind = EOFSym; break;               // no need to getChar here, of course
            default :
              symKind = noSym; getChar(); break;
          }
        }
        sym = new Token(symKind, symLex.toString());
      }

One might think that another way would incorporate code like this, which some people tried

        if (Character.isDigit(ch)) {                 // must be okay
          symKind = intSym;                          // initial assumption
          while (Character.isDigit(ch) || ch == '.')
            if (ch == '.') symKind = floatSym;       // change our assumption
            symLex.append(ch); getChar();            // scan over the number
          }
        }
       else ...

but this is subtly WRONG - it would accept something like 45.67.89 as a valid floating number (can you see why? If not, better come to ask me!)

3. Suppose we have a grammar with the PRODUCTIONS

     PRODUCTIONS
        A = '(' { '+' number } [ B ] ')' .
        B = '-' number | '.' .

Assuming that you have an accept routine like the one you used last week, and a scanner that can recognise tokens that might be described by the enumeration

EOFSym, noSym, plusSym, minusSym, numSym, lparenSym, rparenSym, periodSym

how would you complete the parser routines below?

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
    }

    static void B () {
    //  B = '-' number | '.' .
    }

This was rather badly done, and it is very easy. An important idea that many people have missed is that parser routines are "driven" by checking for FIRST tokens, not by negative tests for FOLLOW tokens. So we need something like this:

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
      accept(lparenSym, "( expected");
      while (symKind == plusSym) {
        getSym(); accept(numSym, "number expected");
      }
      if (sym.kind == minusSym || sym.kind == periodSym) B();
      accept(rparenSym, ") expected");
    }

    static void B () {
    //  B = '-' number | "." .
      switch (sym.kind) {
        case minusSym:
          getSym(); accept(numSym, "- expected"); break;
        case periodSym:
          getSym(); break;
        default:
          abort("unexpected symbol"); break;
      }
    }

alternatively:

    static void B () {
    //  B = '-' number | "." .
      if (sym.kind == minusSym) {
        getSym(); accept(numSym, "- expected");
      } else accept(periodSym, "- expected");
    }

Each routine should be written in its own "context". The routine for A should not worry about what B has to do, other than to check whether B should be called. Coding A like this, which some people did, is back to front:

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
      accept(lparenSym, "( expected");
      if (sym.kind != rparenSym) {
        if (sym.kind = minusSym) B();
        while (sym.kind == plusSym) { getSym(); accept(numSym, "number expected"); }
      } else getSym();
    }

Finally, there were several submissions that tested for EOFSym in various places. While I see what you may have been thinking (since the associated prac had used EOFSym), in this grammar there is no mention of EOFSym, so don't be tempted to introduce it into the productions!