Computer Science 3 - 2001 - Test on Prac 17 THURSDAY TEST 1. No, I still don't know: What is your surname? 2. Consider the following simple Cocol grammar that attempts to describe the Roman Number representations of the numbers 1 through 9, that is I, II, III, IV, V, VI, VII, VIII, IX. COMPILER Roman PRODUCTIONS Roman = OneToThree | FourOrNine | FiveToEight . OneToThree = "I" [ "I" ] [ "I" ] . FourOrNine = "I" ( "V" | "X" ) . FiveToEight = "V" [ OneToThree ]. END Roman. This grammar is correct in one sense, but suffers from several flaws. How many can you find, and can you write a better one? (Keep it as simple as possible; no need to add comments and ignorable sections; concentrate on the PRODUCTIONS). It is badly non-LL(1), and is ambiguous. A string like II could be derived from OneToThree by selecting the first and second I's, or the first and third I's. And the OneToThree and FourToNine alternatives both have FIRST sets consisting of a single "I". There are numerous ways of correcting it. The brute force one would be to say Roman = "I" | "II" | "III" | "IV" | "V" | "VI" | "VII" | "VIII" | "IX" | "X" . and if you suggested that you got credit! Entering more into the spirit of things might lead to productions like PRODUCTIONS Roman = OneToThreeOrFourOrNine | FiveToEight . ZeroToTwo = [ "I" [ "I" ]] . OneToThreeOrFourOrNine = "I" [ ZeroToTwo | "V" | "X" ] . FiveToEight = "V" [ "I" ZeroToTwo ] . END Roman. 3. Consider the attached, hopefully familiar Clang grammar. Suppose that you wanted to extend the Clang language so that you could allow real numeric literals like 123.45 or .45 or 45. as well as the whole number ones (like 1234). What changes would you have to make to the Cocol specification? The problem here is to make sure that all the valid combinations are allowed, unambiguously, insisting that at least one digit appear, and with no possibility of obtaining several periods in one number. So attempts like the following are wrong TOKENS number = digit { digit } [ "." ] { digit } . (* ambiguous *) number = [ "." ] digit { digit } [ "." { digit } ] . (* can get 2 "." *) number = {digit } [ "." {digit } ] . (* nullable *) Here is a better one: number = digit { digit } [ "." { digit } ] | "." digit { digit } . 4. Suppose you wanted to allow Clang to have both one and two dimensional arrays, so that you could write code like VAR Matrix[10, 10], Vector[10], I; BEGIN I := 1; WHILE I <= 10 DO BEGIN Vector[I] := Matrix[I, 10] * Matrix[I+4, I-3]; I := I + 1 END; What changes would have to be made to the Cocol specification? This was so easy that it is a pity that so many people missed it: UpperBound = "[" number [ "," number ] "]" . Designator = identifier [ "[" Expression [ "," Expression ] "]" ] . 5. A very quick and dirty solution to the stack assembler grammar was picked out of the rubbish bin early this morning. The PRODUCTIONS section read PRODUCTIONS Program = { Statement } . Statement = [ number ] Opcode [ number | string ] EOL . Opcode = "ADR" | "LIT" | "BZE" | (* all the rest were there ... *) "STO" | "INN" | "PRS" . END Program. What critical message would the marker have scribbled on this, if it had been handed in? (Assume that the TOKENS, COMMENTS, CHARACTERS and IGNORABLE sections are correct - focus your attention simply on the PRODUCTIONS as given here). Clearly this one, at least, had been encountered in the practical itself. There are two main problems (a) no distinction is drawn between the various classes of opcodes, and this is so easily specified that it should have been done (b) the grammar does not allow completely empty statements (with only the EOL at the end). Here is one better solution; there are plenty more like it. PRODUCTIONS Program = { [ Statement ] EOL } . Statement = [ number ] Operation . Operation = Opcode1 | Opcode2 number | Opcode3 string . Opcode1 = "STO" | "INN" | "VAL" (* all the ones like this *) . Opcode2 = "ADR" | "LIT" | "BZE" (* all the ones like that *) . Opcode3 = "PRS" . END Program. FRIDAY TEST 1 No, I still don't know: What is your surname? 2. Consider the following simple Cocol grammar that attempts to describe a medley of bagpipe tunes on the lines of those you have enjoyed listening to all week (you will remember that by tradition a strathspey section is always followed by at least one reel). COMPILER Medley PRODUCTIONS Medley = { Tune } "silence" . Tune = "march" | "jig" | "hornpipe" | "slowMarch" | "reel" | StrathspeyReel . StrathspeyReel = "strathspey" { "strathspey" } "reel" { "reel" } . END Medley. This grammar is correct in one sense, and incorrect in another. What is wrong with it, and could you write a better one? (Don't get side tracked into discussing the other kinds of competition in the original problem; we simply want to describe the Medley event here.) The grammar is non-LL(1), and ambiguous. The problem lies with the nullable { "reel" } clause. The better grammar would simply have StrathspeyReel = "strathspey" { "strathspey" } "reel" . and, possibly Medley = Tune { Tune } "silence" . on the grounds that if there are no tunes at all you don't really have anything to listen to at all! 3. Consider the attached, hopefully familiar Clang grammar. Suppose that you wanted to extend the Clang language so that you could allow hexadecimal numeric literals like 0FFH and binary literals like 001010B. What changes would you have to make to the Cocol specification? This can be done most efficiently by writing CHARACTERS binDigit = "01" . digit = "0123456789" . hexDigit = digit + "ABCDEFabcdef" . TOKENS number = digit { digit } | digit { hexDigit } "H" | binDigit {binDigit } "B". Note that the following (submitted in some form by various students) is wrong number = digit { digit } | hexDigit { hexDigit } "H" | binDigit {binDigit } "B". as this would allow, for example BACH, which would be a valid identifier as well as an apparently valid hexadecimal number. Numbers must start with digits, and the digits must remain in range. And, in spite of various attempts to persuade me otherwise, I remain unimpressed by this sort of thing: CHARACTERS binDigit = "01B" . digit = "0123456789" . hexDigit = digit + "ABCDEFHabcdefH" . TOKENS number = digit { digit } | digit { hexDigit } | binDigit { binDigit }. which would have allowed "numbers" like 99HHHHHH123H or B0B0B0B. 4. Some languages allow one to have variations on the IfStatement that allow one to write code like if A > B then C := D elsif A > E then Write('A is bigger than E') elsif A > G then while A > G do A := A -1 else Read (* new value of *) (A) that is, where one has a number of optional "elsif" clauses and an optional final "else" clause. How would you add this statement type to the Cocol specification of Clang? This was so easy that it was alarming to see how many people got it wrong! IfStatement = "IF" Condition "THEN" Statement { "ELSIF" Condition "THEN" Statement } [ "ELSE" Statement ] . 5. A Keen Type has decided to extend Clang to allow programmers to incorporate exponentiation into their expressions, for example, writing x^2 + y^3 to mean x2 + y3. She has had two Bright Ideas. One is to change the productions to Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Primary [ "^" Expression ] . /* ---------- addition */ Primary = Designator | number | "(" Expression ")" . AddOp = "+" | "-" . MulOp = "*" | "/" . and the other is to make the change Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Primary [ "^" Factor ] . /* ---------- addition */ Primary = Designator | number | "(" Expression ")" . AddOp = "+" | "-" . MulOp = "*" | "/" . Explain to her which is the better choice, giving reasons. Very few people saw the subtlety here. The first one suffers from the problem that in a string like the one suggested (that's why I suggested it, as a hint!) x^2 + y^3 the operators would seem to bind in such a way that it would "mean" x^(2 + y^(3)) which is probably not what one would like. In fact it is worse than that: the grammar is non-LL(1) (which is not in itself necessarily a disaster), and it is ambiguous (which is pretty awful - if you don't believe me, try drawing parse trees for 4^5*6). If you want to write the equivalent of x^(2+y^3), then the second grammar still allows you to do that, since a primary can be a parenthesized expression; the second one would put the natural interpretation onto x^2 + y^2 as well.
REFERENCE - ORIGINAL CLANG GRAMMAR COMPILER Clang $XCN /* generate compiler, C++ */ /* Simple grammar for Clang level 1 P.D. Terry, Rhodes University, 1998 */ IGNORE CASE IGNORE CHR(9) .. CHR(13) COMMENTS FROM "(*" TO "*)" CHARACTERS cr = CHR(13) . lf = CHR(10) . letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . digit = "0123456789" . instring = ANY - "'" - cr - lf . TOKENS identifier = letter { letter | digit } . number = digit { digit } . string = "'" (instring | "''") { instring | "''" } "'" . PRODUCTIONS Clang = "PROGRAM" identifier ";" Block "." . Block = { ConstDeclarations | VarDeclarations } CompoundStatement . ConstDeclarations = "CONST" OneConst { OneConst } . OneConst = identifier "=" number ";" . VarDeclarations = "VAR" OneVar { "," OneVar } ";" . OneVar = identifier [ UpperBound ] . UpperBound = "[" number "]" . CompoundStatement = "BEGIN" Statement { ";" Statement } "END" . Statement = [ CompoundStatement | Assignment | IfStatement | WhileStatement | ReadStatement | WriteStatement ] . Assignment = Variable ":=" Expression . Variable = Designator . Designator = identifier [ "[" Expression "]" ] . IfStatement = "IF" Condition "THEN" Statement . WhileStatement = "WHILE" Condition "DO" Statement . Condition = Expression RelOp Expression . ReadStatement = "READ" "(" Variable { "," Variable } ")" . WriteStatement = "WRITE" [ "(" WriteElement { "," WriteElement } ")" ] . WriteElement = string | Expression . Expression = ( "+" Term | "-" Term | Term ) { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Designator | number | "(" Expression ")" . AddOp = "+" | "-" . MulOp = "*" | "/" . RelOp = "=" | "<>" | "<" | "<=" | ">" | ">=" . END Clang.