Test on Practical 23

Computer Science 301 - 2005

Test on Prac 23 - Week beginning 3 October 2005 - Solutions

This test is intended to check your ability to write simple scanners and parsers.

1. You won't believe your luck - you can get this 100% right! What are your name (surname) and student number? [1]

Pat (Terry) 663T0844

2. Suppose we wanted to write a scanner to match the specification expressed in Cocol as follows (only two kinds of tokens are valid!)

       CHARACTERS
         digit = "0123456789" .
         eol   = CHR(10) .
       COMMENTS FROM ";" TO eol
       IGNORE CHR(1) .. CHR(32)  /* character handler returns CHR(0) as EOF */
       TOKENS
         float   = digit { digit } '.' { digit } .
         integer = digit { digit } .

The routine must assign either floatSym or intSym to sym.kind before leaving the routine (or assign the value of noSym if it comes across an invalid token, or eofSym at the end of file). Complete the skeleton code below to show how this might be done:

Many submissions did not deal with comments at all!

      static void getSym() {  // Scans for next sym
        StringBuffer symLex = new StringBuffer();
        int symKind = noSym;
        while (ch > EOF && ch <= ' ') getChar();
        if (ch == ';') {                             // comment
          do getChar(); while (ch != '\n' && ch := EOF);
          getChar(); getSym();                       // now look for the symbol again
          return;                                    // (will be EOFSym if unterminated)
        }
        if (Character.isDigit(ch)) {                 // must be okay
          symKind = intSym;                          // initial assumption
          while (Character.isDigit(ch))
            symLex.append(ch); getChar();            // scan over the first few digits
          if (ch == '.') {                           // okay, we must have a floating number
            symKind = floatSym;                      // so change our assumption
            do {                                       // now scan past the digits after the point
              symLex.append(ch); getChar();
            } while (Character.isdigit(ch))
          }
        }
        else {
          symLex.append(ch);
          switch (ch) {
            case EOF:
              symLex = new StringBuffer("EOF");      // special representation
              symKind = EOFSym; break;               // no need to getChar here, of course
            default :
              symKind = noSym; getChar(); break;
          }
        }
        sym = new Token(symKind, symLex.toString());
      }

One might think that another way would incorporate code like this, which some people tried

        if (Character.isDigit(ch)) {                 // must be okay
          symKind = intSym;                          // initial assumption
          while (Character.isDigit(ch) || ch == '.')
            if (ch == '.') symKind = floatSym;       // change our assumption
            symLex.append(ch); getChar();            // scan over the number
          }
        }
       else ...

but this is subtly WRONG - it would accept something like 45.67.89 as a valid floating number (can you see why? If not, better come to ask me!)

3. Suppose we have a grammar which included the PRODUCTIONS

     PRODUCTIONS
        A = '(' { '+' number } [ B ] ')' .
        B = '-' number | '.' .

Assuming that you have an accept routine like the one you used last week, and a scanner that can recognise tokens that might be described by the enumeration

EOFSym, noSym, plusSym, minusSym, numSym, lparenSym, rparenSym, periodSym

how would you complete the parser routines below?

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
    }

    static void B () {
    //  B = '-' number | '.' .
    }

This not always well done, but it was very easy. An important idea that many people have missed is that parser routines are "driven" by checking for FIRST tokens, not by negative tests for FOLLOW tokens. So we need something like this:

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
      accept(lparenSym, "( expected");
      while (symKind == plusSym) {
        getSym(); accept(numSym, "number expected");
      }
      if (sym.kind == minusSym || sym.kind == periodSym) B();
      accept(rparenSym, ") expected");
    }

    static void B () {
    //  B = '-' number | "." .
      switch (sym.kind) {
        case minusSym:
          getSym(); accept(numSym, "- expected"); break;
        case periodSym:
          getSym(); break;
        default:
          abort("unexpected symbol"); break;
      }
    }

alternatively:

    static void B () {
    //  B = '-' number | "." .
      if (sym.kind == minusSym) {
        getSym(); accept(numSym, "- expected");
      } else accept(periodSym, "- expected");
    }

Each routine should be written in its own "context". The routine for A should not worry about what B has to do, other than to check whether B should be called. Coding A like this, which some people did, is back to front:

    static void A () {
    //  A = '(' { '+' number } [ B ] ')' .
      accept(lparenSym, "( expected");
      if (sym.kind != rparenSym) {
        if (sym.kind = minusSym) B();
        while (sym.kind == plusSym) { getSym(); accept(numSym, "number expected"); }
      } else getSym();
    }

Finally, there were several submissions that tested for EOFSym in various places. While I see what you may have been thinking (since the associated prac had used EOFSym), in this grammar there is no mention of EOFSym, so don't be tempted to introduce it into the productions!