Computer Science 3 - 2001 - Test on Prac 18 THURSDAY TEST 2. Suppose we wanted to write a scanner to match the specification expressed in Cocol as follows (only two kinds of tokens are valid!) CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . digit = "0123456789" . eol = CHR(13) . IGNORE CHR(1) .. CHR(32) COMMENTS FROM "//" TO eol TOKENS begin = "begin" . identifier = "a" { letter | digit } . The routine must assign either beginSym or identSym to Sym before leaving the routine (or assign the value of unknownSym if it comes across an invalid token). Complete the skeleton code below to show how this might be done: Very few people saw that they were expected to be able to skip comments! A full solution is a bit messy, unless you introduce the idea of storing letters in an array. What I was looking for was the insight that you had to scan the remainder of the word, and, sadly, not everyone saw this. Here is an array based solution: void GetSym (void) { // Scans for next sym while (ch > 0 && ch <= ' ') GetChar(); sym = unknownSym; // not recognised yet switch (ch) { case '/' : // possible comment GetChar(); if (ch == '/') { // definite comment do GetChar(); while (ch != '\n'); GetChar(); GetSym(); // now look for the symbol again } break; case 'a' : sym = identSym; do // scan remainder of ident GetChar(); while (isalpha(ch)); break; case 'b' : char spelling[100]; int i = 0; do // build up the word { spelling[i++] = ch; GetChar(); } while (isalpha(ch)); spelling[i] = '\0'; // null terminator if (strcmp(spelling, "begin") == 0) sym = beginSym; default : break; // already handled - sym = unknownSym; } } 3. Suppose we have a grammar with the PRODUCTIONS PRODUCTIONS A = '(' { '+' number } [ B ] ')' . B = '-' number | "." . Assuming that you have an accept routine like the one you used last week, and a scanner that can recognise tokens that might be described by the enumeration enum SYMBOLS { EOFSym, plusSym, minusSym, numberSym, lparenSym, rparenSym, periodSym, unknownSym}; how would you complete the parser routines below? void A (void) { // A = '(' { '+' number } [ B ] ')' . } void B (void) { // B = '-' number | "." . } This was rather badly done, and it is very easy. An important idea that many people have missed is that parser routines are "driven" by checking for FIRST tokens, not by negative tests for FOLLOW tokens. So we need something like this: void A (void) { // A = '(' { '+' number } [ B ] ')' . accept(lparenSym, "( expected"); while (Sym == plusSym) { GetSym(); accept(numberSym, "number expected"); } if (sym == minusSym || sym == periodSym) B(); accept(rparenSym, ") expected"); } void B (void) { // B = '-' number | "." . switch (sym) { case minusSym: GetSym(); accept(numberSym, "- expected"); break; case periodSym: GetSym(); break; default : error("unexpected symbol"); break; } } alternatively: void B (void) { // B = '-' number | "." . if (sym == minusSym) { GetSym(); accept(numberSym, "- expected"); } else accept(periodSym, "- expected"); } Each routine should be written in its own "context". The routine for A should not worry about what B has to do, other than to check whether B should be called. Coding A like this, which some people did, is back to front: void A (void) { // A = '(' { '+' number } [ B ] ')' . accept(lparenSym, "( expected"); if (Sym != rparenSym) { if (Sym = minusSym) B(); while (Sym == plusSym) { GetSym(); accept(numberSym, "number expected"); } else error(") expected"); } Finally, there were several submissions that tested for EOFSym in various places. While I see what you may have been thinking (since the associated prac had used EOFSym), in this grammar there is no mention of EOFSym, so don't be tempted to introduce it into the productions! FRIDAY TEST 2. Suppose we wanted to write a scanner to match the specification expressed in Cocol as follows (only two kinds of tokens are valid!) CHARACTERS digit = "0123456789" . eol = CHR(13) . IGNORE CHR(1) .. CHR(32) COMMENTS FROM ";" TO eol TOKENS float = digit { digit } '.' { digit } . integer = digit { digit } . The routine must assign either floatSym or intSym to Sym before leaving the routine (or assign the value of unknownSym if it comes across an invalid token). Complete the skeleton code below to show how this might be done: Very few people saw that they were expected to be able to skip comments! The best way to do this is probably as follows: void GetSym (void) { // Scans for next sym while (ch > 0 && ch <= ' ') GetChar(); if (ch == ';') { // comment do GetChar(); while (ch != '\n'); GetChar(); GetSym(); // now look for the symbol again return; } if (isdigit(ch)) { // must be okay Sym = intSym; // initial assumption while (isdigit(ch)) GetChar(); // scan over the first few digits if (ch == '.') { // okay, we must have a floating number Sym = floatSym; // so change our assumption do // now scan past the digits after the point GetChar(); while (isdigit(ch)) } } else Sym = unknownSym; } Another way might appear to be like this, which some people tried void GetSym (void) { // Scans for next sym while (ch > 0 && ch <= ' ') GetChar(); sym = unknownSym; // not recognised yet if (ch == ';') { // comment do GetChar(); while (ch != '\n'); GetChar(); GetSym(); // now look for the symbol again return; } if (isdigit(ch)) { Sym = intSym; // initial assumption while (isdigit(ch) || ch = '.' ) { // scan over the rest GetChar(); if (ch == '.') Sym = floatSym; // change our assumption } } else Sym = UnknownSym; } but this is subtly WRONG - it would accept something like 45.67.89 as a valid floating number (can you see why? If not, better come to ask me!) 3. Suppose we have a grammar with the PRODUCTIONS PRODUCTIONS A = [ B { C } ] . B = number | identifier . C = '(' string B ')' . Assuming that you have an accept routine like the one you used last week, and a scanner that can recognise tokens that might be described by the enumeration enum SYMBOLS { EOFSym, numSym, identSym, stringSym, lparenSym, rparenSym, unknownSym }; how would you complete the parser routines below? void A (void) { // A = [ B { C } ] . } void B (void) { // B = number | identifier . } void C (void) { // C = '(' string B ')' . } This was very badly done, and it is very easy. An important idea that many people have missed is that parser routines are "driven" by checking for FIRST tokens, not by negative tests for FOLLOW tokens. So we need something like this: void A (void) { // A = [ B { C } ] . if (Sym == numSym || Sym == identSym) { B(); while (Sym == lparenSym) C(); } } void B (void) { // B = number | identifier . if (Sym == numSym || Sym == identSym) GetSym(); else error("invalid B"); } void C (void) { // C = '(' string B ')' . accept(lparenSym, "( expected"); accept(stringSym, "string expected"); B(); accept(rparenSym, ") expected"); } Each routine should be written in its own "context". The routine for A should not worry about what B has to do, other than to check whether B should be called. Coding like this, which several people did, "works", but is illogical - it works only if there are no other productions in the system that depend on A, B or C. void A (void) { // A = [ B { C } ] . if (Sym == numSym || Sym == identSym) { GetSym(); while (Sym == lparenSym) { GetSym(); C(); } } } void B (void) { // B = number | identifier . // nothing, already handled in A } void C (void) { // C = '(' string ')' . // already accepted ( in A, so start from string accept(stringSym, "string expected"); B(); accept(rparenSym, ") expected"); } Finally, there were several submissions that tested for EOFSym in various places. While I see what you may have been thinking (since the associated prac had used EOFSym), in this grammar there is no mention of EOFSym, so don't be tempted to introduce it into the productions!