Please answer all questions in the spaces provided (two sides). Max 30
This test is intended to check your ability to write simple scanners and parsers. Textbooks may not be used.
1. I have really lost it this week. I cannot even remember the question! But what is the answer?
Pat Terry 63T0884
2. Suppose we have a grammar which included the PRODUCTIONS
PRODUCTIONS A = [ B ] '(' number { '+' number | B } ')' . B = '-' number '.' | '.' .
Assuming that you have Accept and Abort routines like those you used last week, and a scanner GetSym() that can recognise tokens that might be described by the enumeration
EOFSym, noSym, plusSym, minusSym, numSym, lparenSym, rparenSym, periodSym;
how would you complete the parser routines below? Note: the grammar might have other productions, some of which might have right sides that include the non-terminals A and B. [11]
I was hoping for solutions on the line of the following:
static void A () { // A = [ B ] '(' number { '+' number | B } ')' . if (sym.kind == minusSym || sym.kind == periodSym) B(); Accept(lparenSym, "( expected"); Accept(numSym, "number expected"); while (sym.kind == plusSym || sym.kind == minusSym || sym.kind == periodSym) { if (sym.kind == plusSym) { GetSym(); Accept(numSym, "number expected") } else B(); } Accept(rparenSym, ")" expected); } static void B () { // B = '-' number '.' | '.' . if (sym.kind == minusSym) { GetSym(); Accept(numSym, "number expected"); Accept(periodSym, ". expected"; } else Accept(periodSym, ". expected"; }
The method for B could also have been coded with a switch statement. If you do this, be careful to include the error case:
static void B () { // B = '-' number '.' | '.' . switch (sym.kind) case minusSym : { GetSym(); Accept(numSym, "number expected"); Accept(periodSym, ". expected"; break; } case periodSym : GetSym(); break; default: Abort("invalid start to B"); break; } }
There might be a temptation to modify the grammar and work from:
static void B () { // B = '-' number '.' | '.' . // changed to B = [ '-' number ] '.' . if (sym.kind == minusSym) { GetSym(); Accept(numSym, "number expected"); } Accept(periodSym, ". expected"; }
The grammars are syntactically equivalent, but one would not be able to do this transformation with impunity if different "actions" were to be associated with the different periods.
Several submissions came up with the following for A
static void A () { // A = [ B ] '(' number { '+' number | B } ')' . if (sym.kind == minusSym || sym.kind == periodSym) B(); Accept(lparenSym, "( expected"); Accept(numSym, "number expected"); while (sym.kind != rparenSym) { if (sym.kind == plusSym) { GetSym(); Accept(numSym, "number expected"); } else B(); } Accept(rparenSym, ")" expected); }
that is, drove the repetition { '+' number | B } "from the right" instead of "from the left". I don't know that this is a good idea at all. If the closing parenthesis were omitted the loop would be in danger of not terminating nicely.
3. Suppose we wanted to write a scanner to match the specification of numbers used in the assembler grammar recently, which can be expressed in Cocol as follows (only three kinds of tokens are valid!) [18]
CHARACTERS digit = "0123456789" . binDigit = "01" . hexDigit = digit + "ABCDEF" . eol = CHR(10) . COMMENTS FROM "{" TO "}" IGNORE CHR(1) .. CHR(32) /* character handler returns CHR(0) as EOF */ TOKENS decNumber = digit { digit } . /* digits only */ binNumber = binDigit { binDigit } "%" . /* trailing % */ hexNumber = digit { hexDigit } "H" . /* trailing H */
The routine must assign either decSym or binSym or hexSym to symKind before leaving the routine (or assign the value of noSym if it comes across an invalid token, or eofSym at the end of file). Complete the skeleton code below to show how this might be done:
This was deliberately a tough question, and sadly not many people saw the problem - which is that tokens like 01103% (with a non binary digit in what appears to be a binary sequence) or 1A3 (a hex number without a trailing H) should give rise to an error. One way to handle this is suggested below.
Secondly, the comment handling left a lot to be desired, and in many cases I wondered whether their authors could possibly have understood the practical in this respect. If a comment is detected one has firstly to ignore it, and then to press on to find a symbol - and this might only happen after one has found another comment, in a sequence like { comment } { comment(after more whitespace) } 1234.
One way of handling comments is to have a recursive call, but you must be careful to ensure that when the "inner" call returns, control flows correctly.
static void GetSym() { // Scans for next sym StringBuilder symLex = new StringBuilder(); int symKind = noSym; // default is noSym while (ch > EOF && ch <= ' ') GetChar(); if (ch == '{') { // comment do GetChar(); while (ch != '}' && ch != EOF); if (ch == '}') { GetChar(); GetSym(); return; // note break out! } else output.Write("unclosed comment"); } else { if (Char.IsDigit(ch)) { // hopefully a number boolean seenHex = false, seenDec = false; // it might be binary do { seenDec = seenDec || ch > '1' && ch <= '9'; // no, it might be decimal seenHex = seenHex || ch >= 'A' && ch <= 'F'; // no, it might be hexadecimal symLex.Append(ch); GetChar(); } while (IsHexDigit(ch)); // now have seen all the digits if (ch == '%') { // check that we had not seen bad digits symLex.Append(ch); GetChar(); if (!seenHex && !seenDec) // only seen 0s and 1s symKind = binNumber; // else it must be noSym (eg 121% or 1E1%) } else if (ch == 'H') { symLex.Append(ch); GetChar(); symKind = hexNumber; // it must have been hexadecimal } else if (!seenHex) symKind = decNumber; // else it must be noSym (eg 1A34) } else // unknown symbol or EOF if (ch == EOF) symKind = EOFSym; sym = new Token(symKind, symLex.ToString()); } }
If you don't like the recursive solution you might like this one instead, which also shows another way of discrimination between numbers (some submissions tried something like this):
static void GetSym() { // Scans for next sym StringBuilder symLex = new StringBuilder(); int symKind = noSym; // default is noSym while (ch > EOF && ch <= ' ') GetChar(); // leading whitespace while (ch == '{') { // there may be several comments in a row do GetChar(); while (ch != '}' && ch != EOF); if (ch == EOF) { reportError("unclosed comment"); break; // out of the outer loop } while (ch > EOF && ch <= ' ') GetChar(); // skip trailing whitespace } if (Character.isDigit(ch)) { // hopefully a number int status = bin; while (isBinDigit(ch)) { // leading binary digits symLex.Append(ch); GetChar(); } while (Char.IsDigit(ch)) { // decimal or binary digits status = dec; symLex.Append(ch); GetChar(); } while(IsHexDigit(ch)) { // hex, decimal or binary status = hex; symLex.Append(ch); GetChar(); } switch (ch) { // character following digits case 'H' : symLex.Append(ch); GetChar(); symKind = hexNumber; break; case '%' : symLex.Append(ch); GetChar(); if (status == bin) symKind = binNumber; // else it is in error break; default : if (status != hex) symKind = decNumber; // else it is in error break; } } else // unknown symbol or EOF if (ch == EOF) symKind = EOFSym; sym = new Token(symKind, symLex.ToString()); }