Please answer all questions in the spaces provided (four sides). Max 30
This test is intended to check your ability to write simple scanners and parsers. Textbooks may not be used.
1. You can get this 100% right! What are your name (surname) and student number? [1]
Pat (Terry) 63T0884
2. Suppose we have a Cocol grammar
COMPILER Declarations /* Describe a subset of the forms that declarations can assume in a simple C-like language */ CHARACTERS digit = "0123456789" . letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . TOKENS ident = letter { letter | digit | "_" ( letter | digit ) } . COMMENTS FROM "{" TO "}" NESTED IGNORE CHR(0) .. CHR(31) PRODUCTIONS Declarations = { DecList } EOF . DecList = Type OneDecl { "," OneDecl } ";" . Type = "int" | "void" | "bool" | "char" . OneDecl = "*" OneDecl | Direct . Direct = ( ident | "(" OneDecl ")" ) [ Params ] . Params = "(" [ OneParam { "," OneParam } ] ")" . OneParam = Type [ OneDecl ] . END Declarations.
Assuming that you have accept and abort routines like those you used in this last week, and a scanner getSym() that can recognise tokens that might be described by the enumeration
EOFSym, noSym, identSym, intSym, voidSym, boolSym, charSym, commaSym, lparenSym, rparenSym,
semicolonSym, starSym;
how would you complete the parser routines below?
Several submissions added EOFSym into FirstDeclarations. I fear I may have confused some of you in an earlier tutorial. EOFSym is not a "real" terminal. It is injected into the grammar above as a convenient way of making sure that an error message will result if the input ends prematurely.
static SymSet FirstDeclarations = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}), FirstDeclList = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}), FirstType = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}), FirstOneDecl = new SymSet(new int[] {starSym, identSym, lparenSym}), FirstDirect = new SymSet(new int[] {identSym, lparenSym}), FirstParams = new SymSet(new int[] {lparenSym}), FirstOneParam = new SymSet(new int[] {intSym, voidSym, boolSym, charSym}), static void Declarations () { // Declarations = { DecList } EOF . while FirstDeclList.contains(sym.kind) DecList(); accept(EOFSym, "EOF expected"); } static void DecList () { // DecList = Type OneDecl { "," OneDecl } ";" . Type(); OneDecl(); while (sym.kind == commaSym) { getSym(); OneDecl(); } accept(semicolonSym, "; expected"); } static void Type () { // Type = "int" | "void" | "bool" | "char" . if Firsttype.contains(sym.kind) getSym(); else abort("invalid type"); } static void OneDecl () { // OneDecl = "*" OneDecl | Direct . if (sym.kind == starSym) { getSym(); oneDecl(); } else if (FirstDirect.contains(sym.kind)) Direct(); else abort("invalid start to OneDecl"); }
Several submissions simplified this to the following, leaving out the "abort" call:
static void OneDecl () { // OneDecl = "*" OneDecl | Direct . if (sym.kind == starSym) { getSym(); oneDecl(); } else Direct(); }
This is living dangerously! Similarly, a "safe" way to code Direct would be
static void Direct () { // Direct = ( ident | "(" OneDecl ")" ) [ Params ] . if (sym.kind == identSym()) getSym; else if (sym.kind == lparenSym) { getSym; OneDecl(); accept(rparenSym, ") expected"); } else abort("invalid start to Direct"); if (sym.kind == lparenSym) Params(); }
although the following is also "safe", and is a bit shorter. The point is that one should not assume that when Direct is called there will be an identSym or lparenSym as the next token.
static void Direct () { // Direct = ( ident | "(" OneDecl ")" ) [ Params ] . if (sym.kind == identSym()) getSym; else { accept(lparenSym, "( expected"); OneDecl(); accept(rparenSym, ") expected"); } if (sym.kind == lparenSym) Params(); } static void Params () { // Params = "(" [ OneParam { "," OneParam } ] ")" . accept(lparenSym, "( expected"); if (FirstOneParam.contains(sym.kind)) { OneParam(); while (sym.kind == commaSym) { getSym(); OneParam(); } } accept(rparenSym, ") expected"); } static void OneParam () { // OneParam = Type [ OneDecl ] . Type(); if (FirstOneDecl.contains(sym.kind)) OneDecl(); }
3. A student has been trying to develop a scanner for the system. The part of it that deals with recognizing identifiers and ignoring comments has led her to develop the following code. Sadly, it is not quite correct. Suggest the changes needed to correct it.
This was not well done. Many people seemed not to have noticed that the scanner was required to discard (possibly) nested comments, and that an underscore in an identifier had to be followed by a letter or digit. And don't let the scanner call abort. The scanner should simply return noSym as the value of sym.kind and let the parser react accordingly.
The skeleton code was as follows.
static void getSym() { // Scans for next sym StringBuffer symLex = new StringBuffer(); int symKind = noSym; // assume the worst! while (ch > EOF && ch <= ' ') getChar(); // whitespace if (ch = '{' ) { // deal with comments do { getChar(); } while (ch != '}'); } if (Character.isLetter(ch)) { // recognize identifiers do { symLex.append(ch); getChar(); } while (Character.isLetterOrDigit(ch)); symKind = literalKind(symLex); }
This made no attempt to handle nested comments, to continue to look for a token after dealing with a comment, to guard against unclosed comments, and so on. It also made no attempt to handle identifiers with underscores that have to be followed by a letter or digit. A solution follows:
static int literalKind(StringBuffer lex) { // handle keywords String s = lex.toString(); if (s.equals("bool")) return boolSym; if (s.equals("char")) return charSym; if (s.equals("int")) return intSym; if (s.equals("void")) return voidSym; return identSym; } static void getSym() { // Scans for next sym StringBuffer symLex = new StringBuffer(); int symKind = noSym; // assume the worst! while (ch > EOF && ch <= ' ') getChar(); // whitespace if (ch = '{' ) { // deal with comments int nesting = 1; // level of nesting do { getChar(); if (ch == EOF) { // unclosed comment sym = new Token(EOFSym, "EOF"); return; } if (ch == '{') nesting++; // increase nesting if (ch == '}') nesting--; // decrease nesting } while (nesting != 0); getChar(); getSym(); // recursive call - stil need to find a token return; } if (Character.isLetter(ch)) { // recognize identifiers do { symLex.append(ch); getChar(); if (ch = '_') { // must be followed letter | digit symLex.append(ch); getCh(); if (!Character.isLetterOrDigit(ch)) { // bad identifier - return noSym sym = new Token(noSym, symLex.toString()); return; } } } while (Character.isLetterOrDigit(ch)); symKind = literalKind(symLex); } else { // Please don't concern yourself with the rest of this method } sym = new Token(symKind, symLex.toString()); }