Test on week 23

Computer Science 301 - 2007 - Test on Week 23 - Solutions

Please answer all questions in the spaces provided (four sides). Max 30

This test is intended to check your ability to write simple scanners and parsers. Textbooks may not be used.

1. You can get this 100% right! What are your name (surname) and student number? [1]

Pat (Terry) 63T0884

2. Suppose we have a Cocol grammar

     COMPILER Declarations
     /* Describe a subset of the forms that declarations can assume in a simple C-like language */

     CHARACTERS
       digit  = "0123456789" .
       letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .

     TOKENS
       ident  = letter { letter | digit | "_" ( letter | digit ) } .

     COMMENTS FROM "{" TO "}" NESTED

     IGNORE CHR(0) .. CHR(31)

     PRODUCTIONS
       Declarations = { DecList } EOF .
       DecList      = Type OneDecl { "," OneDecl } ";"  .
       Type         = "int" | "void" | "bool" | "char" .
       OneDecl      = "*" OneDecl | Direct .
       Direct       = ( ident | "(" OneDecl ")" ) [ Params ] .
       Params       = "(" [ OneParam { "," OneParam } ] ")" .
       OneParam     = Type [ OneDecl ] .
     END Declarations.

Assuming that you have accept and abort routines like those you used in this last week, and a scanner getSym() that can recognise tokens that might be described by the enumeration

EOFSym, noSym, identSym, intSym, voidSym, boolSym, charSym, commaSym, lparenSym, rparenSym,
semicolonSym, starSym;

how would you complete the parser routines below?

Several submissions added EOFSym into FirstDeclarations. I fear I may have confused some of you in an earlier tutorial. EOFSym is not a "real" terminal. It is injected into the grammar above as a convenient way of making sure that an error message will result if the input ends prematurely.

    static SymSet
      FirstDeclarations = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}),
      FirstDeclList     = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}),
      FirstType         = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}),
      FirstOneDecl      = new SymSet(new int[] {starSym, identSym, lparenSym}),
      FirstDirect       = new SymSet(new int[] {identSym, lparenSym}),
      FirstParams       = new SymSet(new int[] {lparenSym}),
      FirstOneParam     = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}),

    static void Declarations () {
    // Declarations = { DecList } EOF .
      while FirstDeclList.contains(sym.kind) DecList();
      accept(EOFSym, "EOF expected");
    }

    static void DecList () {
    // DecList = Type OneDecl { "," OneDecl } ";"  .
      Type();
      OneDecl();
      while (sym.kind == commaSym) {
        getSym(); OneDecl();
      }
      accept(semicolonSym, "; expected");
    }

    static void Type () {
    // Type = "int" | "void" | "bool" | "char" .
      if Firsttype.contains(sym.kind) getSym();
      else abort("invalid type");
    }

    static void OneDecl () {
    // OneDecl = "*" OneDecl | Direct .
      if (sym.kind == starSym) {
        getSym(); oneDecl();
      }
      else if (FirstDirect.contains(sym.kind)) Direct();
      else abort("invalid start to OneDecl");
    }

Several submissions simplified this to the following, leaving out the "abort" call:

    static void OneDecl () {
    // OneDecl = "*" OneDecl | Direct .
      if (sym.kind == starSym) {
        getSym(); oneDecl();
      }
      else Direct();
    }

This is living dangerously! Similarly, a "safe" way to code Direct would be

    static void Direct () {
    // Direct = ( ident | "(" OneDecl ")" ) [ Params ] .
      if (sym.kind == identSym()) getSym;
      else if (sym.kind == lparenSym) {
        getSym;
        OneDecl();
        accept(rparenSym, ") expected");
      }
      else abort("invalid start to Direct");
      if (sym.kind == lparenSym) Params();
    }

although the following is also "safe", and is a bit shorter. The point is that one should not assume that when Direct is called there will be an identSym or lparenSym as the next token.

    static void Direct () {
    // Direct = ( ident | "(" OneDecl ")" ) [ Params ] .
      if (sym.kind == identSym()) getSym;
      else {
        accept(lparenSym, "( expected");
        OneDecl();
        accept(rparenSym, ") expected");
      }
      if (sym.kind == lparenSym) Params();
    }

    static void Params () {
    // Params = "(" [ OneParam { "," OneParam } ] ")" .
      accept(lparenSym, "( expected");
      if (FirstOneParam.contains(sym.kind)) {
        OneParam();
        while (sym.kind == commaSym) {
          getSym(); OneParam();
        }
      }
      accept(rparenSym, ") expected");
    }

    static void OneParam () {
    // OneParam  = Type [ OneDecl ] .
      Type();
      if (FirstOneDecl.contains(sym.kind)) OneDecl();
    }

3. A student has been trying to develop a scanner for the system. The part of it that deals with recognizing identifiers and ignoring comments has led her to develop the following code. Sadly, it is not quite correct. Suggest the changes needed to correct it.

This was not well done. Many people seemed not to have noticed that the scanner was required to discard (possibly) nested comments, and that an underscore in an identifier had to be followed by a letter or digit. And don't let the scanner call abort. The scanner should simply return noSym as the value of sym.kind and let the parser react accordingly.

The skeleton code was as follows.

      static void getSym() {  // Scans for next sym
        StringBuffer symLex = new StringBuffer();
        int symKind = noSym;                      // assume the worst!
        while (ch > EOF && ch <= ' ') getChar();  // whitespace
        if (ch = '{' )  {                         // deal with comments
          do {
            getChar();
          } while (ch != '}');
        }
        if (Character.isLetter(ch)) {             // recognize identifiers
          do {
            symLex.append(ch); getChar();
          } while (Character.isLetterOrDigit(ch));
          symKind = literalKind(symLex);
        }

This made no attempt to handle nested comments, to continue to look for a token after dealing with a comment, to guard against unclosed comments, and so on. It also made no attempt to handle identifiers with underscores that have to be followed by a letter or digit. A solution follows:

      static int literalKind(StringBuffer lex) {  // handle keywords
        String s = lex.toString();
        if (s.equals("bool"))  return boolSym;
        if (s.equals("char"))  return charSym;
        if (s.equals("int"))   return intSym;
        if (s.equals("void"))  return voidSym;
        return identSym;
      }

      static void getSym() {  // Scans for next sym
        StringBuffer symLex = new StringBuffer();
        int symKind = noSym;                         // assume the worst!
        while (ch > EOF && ch <= ' ') getChar();     // whitespace
        if (ch = '{' ) {                             // deal with comments
          int nesting = 1;                           // level of nesting
          do {
            getChar();
            if (ch == EOF) {                         // unclosed comment
              sym = new Token(EOFSym, "EOF");
              return;
            }
            if (ch == '{') nesting++;                // increase nesting
            if (ch == '}') nesting--;                // decrease nesting
          } while (nesting != 0);
          getChar(); getSym();                       // recursive call - stil need to find a token
          return;
        }
        if (Character.isLetter(ch)) {                // recognize identifiers
          do {
            symLex.append(ch); getChar();
            if (ch = '_') {                          // must be followed letter | digit
              symLex.append(ch);
              getCh();
              if (!Character.isLetterOrDigit(ch)) {  // bad identifier - return noSym
                sym = new Token(noSym, symLex.toString());
                return;
              }
            }
          } while (Character.isLetterOrDigit(ch));
          symKind = literalKind(symLex);
        }

        else {
          // Please don't concern yourself with the rest of this method
        }
        sym = new Token(symKind, symLex.toString());
      }