Complete sources to these solutions can be found on the course WWW pages in the files PRAC21A.ZIP or PRAC21AC.ZIP
Exponentiation is a stronger operation than multiplication, so it has to be introduced deeper into the hierarchy (in fact this is discussed in the textbook, if you'd only thought of looking!) Functions like sqrt also take precedence over other operations:
COMPILER Calc $CN /* Simple four function calculator - extended P.D. Terry, Rhodes University, 2008 */ CHARACTERS digit = "0123456789" . hexdigit = digit + "ABCDEF" . TOKENS decNumber = digit { digit } . hexNumber = "$" hexdigit { hexdigit } . IGNORE CHR(0) .. CHR(31) PRODUCTIONS Calc = { Expression "=" } EOF . Expression = Term { "+" Term | "-" Term } . Term = Factor { "*" Factor | "/" Factor } . Factor = Primary [ "^" Factor ] . Primary = decNumber | hexNumber | "(" Expression ")" | "sqrt" "(" Expression ")" . END Calc.
The solutions received were mixed. Some didn't capture the idea that there should be N competitions each with M bands. A typical over simplified attempt looks like this
Gath1 = { Competition } . Competition = SlowQuick | MSR | Medley . SlowQuick = "SlowMarch" "March" . MSR = "March" "Strathspey" "Reel" [ "March" ] . Medley = OneTune { OneTune } . OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" .
If we want to introduce the idea that there are multiple competitions with multiple bands, then it might make sense to try
COMPILER Gath2 $CN /* Describes the Pipe Band events at a Highland Gathering P.D. Terry, Rhodes University, 2008 */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS Gath2 = { Competition } . Competition = Band { Band } . Band = SlowQuick | MSR | Medley . SlowQuick = "SlowMarch" "March" . MSR = "March" "Strathspey" "Reel" [ "March" ] . Medley = OneTune { OneTune } . OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" . END Gath2.
This is badly non LL(1) (It represents a continual wail of noise). We need to get some sort of break between bands at least:
COMPILER Gath3 $CN /* Describes the Pipe Band events at a Highland Gathering P.D. Terry, Rhodes University, 2008 */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS Gath3 = { "AnnounceCompetition" Competition } . Competition = Band { Band } . Band = "AnnounceBand" ( SlowQuick | MSR | Medley ). SlowQuick = "SlowMarch" "March" "break" . MSR = "March" "Strathspey" "Reel" [ "March" ] "break" . Medley = OneTune { OneTune } "break" . OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" . END Gath3.
Even this is non-LL(1). We can get an LL(1) grammar if we make the right kind of announcements:
COMPILER Gath4 $CN /* Describes the Pipe Band events at a Highland Gathering P.D. Terry, Rhodes University, 2008 */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS Gath4 = { "AnnounceCompetition" Competition } . Competition = Band { Band } . Band = "AnnounceSlow" SlowQuick | "AnnounceMSR" MSR | "AnnounceMedley" Medley . SlowQuick = "SlowMarch" "March" "break" . MSR = "March" "Strathspey" "Reel" [ "March" ] "break" . Medley = OneTune { OneTune } "break" . OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" . END Gath4.
But none of the above capture the idea that all the bands in one competition must play the same kind of "set" (as they are called). So here is a much better solution:
COMPILER Gath5 $CN /* Describes the Pipe Band events at a Highland Gathering P.D. Terry, Rhodes University, 2008 */ IGNORE CHR(0) .. CHR(31) PRODUCTIONS Gath5 = { Competition } . Competition = "AnnounceSlow" SlowQuickComp | "AnnounceMSR" MSRComp | "AnnounceMedley" MedleyComp . SlowQuickComp = SlowQuick { SlowQuick } . SlowQuick = "SlowMarch" "March" "break" . MSRComp = MSR { MSR } . MSR = "March" "Strathspey" "Reel" [ "March" ] "break" . MedleyComp = Medley { Medley } . Medley = OneTune { OneTune } "break" . OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" . END Gath5.
In all these notice that we have not fallen into the trap of defining:
OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | "Reel" | "Strathspey" { "Strathspey" } "Reel" { "Reel" } .
which is non-LL(1) again, or of trying to write
OneTune = "March" | "SlowMarch" | "Jig" | "Hornpipe" | | "Strathspey" { "Strathspey" } "Reel" { "Reel" } .
which is LL(1), but precludes reels being played without a strathspey immediately before them.
As hinted in the prac sheet, the trick here is to rework the productions that use meta braces for repetition by ones that use right recursion (to avoid LL(1) errors. Here is one possibility:
COMPILER EBNF $CN /* Parse a set of EBNF productions P.D. Terry, Rhodes University, 2008 */ CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . control = CHR(0) .. CHR(31) . digit = "0123456789" . noquote1 = ANY - "'" - control . noquote2 = ANY - '"' - control . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' . COMMENTS FROM "(*" TO "*)" NESTED IGNORE control PRODUCTIONS EBNF = Productions EOF . Productions = Production Productions | . Production = nonterminal "=" Expression "." . Expression = Term MoreTerms . MoreTerms = "|" Term MoreTerms | . Term = Factor MoreFactors . MoreFactors = Factor MoreFactors | . Factor = nonterminal | terminal | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" . END EBNF.
The above grammar matches the one given in the prac sheet. It does not, however, have the property of being able to describe itself any longer - we might argue that we need to be able to describe a nullable factor (the original could not do this). This might be achieved as follows:
... PRODUCTIONS EBNF = Productions EOF . Productions = Production Productions | . Production = nonterminal "=" Expression "." . Expression = Term MoreTerms . MoreTerms = "|" Term MoreTerms | . Term = Factor Term | . Factor = nonterminal | terminal | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" . END EBNF.
but notice that this also allows one to write production rules like
A = b | c | | | .
which you might argue is a bit silly. Further reflection on this is left as a useful exercise!
The Parva extensions produced some interesting submissions. Many of them (understandably!) were too restrictive in certain respects, while others were too permissive. Here is a suggested solution:
COMPILER Parva $CN /* Parva level 1.5 grammar - Prac 21 extensions P.D. Terry, Rhodes University, 2008 Grammar only */ CHARACTERS lf = CHR(10) . backslash = CHR(92) . control = CHR(0) .. CHR(31) . letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . digit = "0123456789" . nonzerodigit = "123456789" . stringCh = ANY - '"' - control - backslash . charCh = ANY - "'" - control - backslash . printable = ANY - control . TOKENS /* Insisting that identifiers cannot end with an underscore is quite easy */ identifier = letter { letter | digit | "_" { "_" } ( letter | digit ) } . /* but a simpler version is what most people thought of identifier = letter { letter | digit | "_" ( letter | digit ) } . */ /* insisting that a number cannot have extraneous leading zeros */ number = "0" | nonzerodigit { digit } . stringLit = '"' { stringCh | backslash printable } '"' . charLit = "'" ( charCh | backslash printable ) "'" . COMMENTS FROM "//" TO lf COMMENTS FROM "/*" TO "*/" IGNORE CHR(9) .. CHR(13) PRODUCTIONS Parva = "void" identifier "(" ")" Block . Block = "{" { Statement } "}" . /* We need some more nonterminals for the new statement forms */ Statement = Block | ConstDeclarations | VarDeclarations | Assignment | IncOrDecStatement | IfStatement | WhileStatement | RepeatStatement | ForStatement | ReturnStatement | HaltStatement | ReadStatement | WriteStatement | ";" . /* Declarations remain the same as before */ ConstDeclarations = "const" OneConst { "," OneConst } ";" . OneConst = identifier "=" Constant . Constant = number | charLit | "true" | "false" | "null" . VarDeclarations = Type OneVar { "," OneVar } ";" . OneVar = identifier [ "=" Expression ] . /* To deal with statements like i = 6; i, j = 4, k; and i++; we need to manipulate the production for Assignment if we want to preserve an LL(1) grammar */ Assignment = Designator ( { "," Designator } "=" Expression { "," Expression } | "++" | "--" ) ";" . /* Prefix increment/decrement statements like ++i; or ++list[7]; cause no LL(1) problems */ IncOrDecStatement = ( "++" | "--" ) Designator ; . /* In all these it is useful to maintain generality by using Designator, not identifier */ Designator = identifier [ "[" Expression "]" ] . /* The if-then-elsif-else construction is most easily described as follows. Although this is not LL(1), this works admirably - it is simply the well-known dangling else ambiguity, which the parser resolves by associating elsif and else clauses with the most recent if */ IfStatement = "if" "(" Condition ")" Statement { "elsif" "(" Condition ")" Statement } [ "else" Statement ] . /* The Pascal-like "repeat" statement is almost trivial. Note that we can make use of several Statements between "repeat" and "until" (different in Java!) */ RepeatStatement = "repeat" { Statement } "until" "(" Condition ")" ";" . /* The for statement is straightforward, but introduces the concept of a list of expressions which is also used in the production for Expression itself */ ForStatement = "for" Designator "in" ExpList Statement . ExpList = "(" Range { "," Range } ")" . Range = Expression [ ".." Expression ] . /* Most of the rest of the grammar remains unchanged: */ WhileStatement = "while" "(" Condition ")" Statement . ReturnStatement = "return" ";" . HaltStatement = "halt" ";" . ReadStatement = "read" "(" ReadElement { "," ReadElement } ")" ";" . ReadElement = stringLit | Designator . WriteStatement = "write" "(" WriteElement { "," WriteElement } ")" ";" . WriteElement = stringLit | Expression . Condition = Expression . /* The basic form of Expression introduces "in", effectively as another relational operator with the same precedence as the other relational operators */ Expression = AddExp [ RelOp AddExp | "in" ExpList ] . AddExp = [ "+" | "-" ] Term { AddOp Term } . Term = Factor { MulOp Factor } . Factor = Designator | Constant | "new" BasicType "[" Expression "]" | "!" Factor | "(" Expression ")" . Type = BasicType [ "[]" ] . BasicType = "int" | "bool" . AddOp = "+" | "-" | "||" . /* The % operator has the same precedence as other multiplicative operators */ MulOp = "*" | "/" | "%" | "&&" . RelOp = "==" | "!=" | "<" | "<=" | ">" | ">=" . END Parva.
The wheels came off in many solutions. It is quite hard to get right, and one cannot easily find an LL(1) grammar that really matches the problem as set. Your situation was not helped by a rather poorly phrased question, which confused some groups. I apologise. Here is a simple first attempt, but with no safety regulations.
COMPILER Train1 $CN /* Grammar for simple railway trains P.D. Terry, Rhodes University, 2008 */ IGNORECASE COMMENTS FROM "(*" TO "*)" NESTED IGNORE CHR(0) .. CHR(31) PRODUCTIONS Train1 = { OneTrain } EOF . OneTrain = LocoPart ( Passengers | FreightOrMixed ) "." . LocoPart = "loco" { "loco" } . FreightOrMixed = Truck { Truck } ( "guard" | Passengers ) . Passengers = { "coach" } "brake" . Truck = "coal" | "open" | "cattle" | "fuel" | "cold" . END Train1.
Here is an attempt at safety. But this one insists on at least two safe trucks in any train, and is not LL(1)
PRODUCTIONS Train2 = { OneTrain } EOF . OneTrain = LocoPart ( Passengers | FreightOrMixed ) "." . LocoPart = "loco" { "loco" } . FreightOrMixed = SafeTruck { AnyTruck } LastPart . LastPart = "guard" | SafeTruck Passengers . Passengers = { "coach" } "brake" . SafeTruck = "coal" | "open" | "cattle" | "cold" . AnyTruck = SafeTruck | "fuel" . END Train2.
Why is it not LL(1)? We could apply all the theory of chapter 7, but maybe an example will suffice. Suppose we have a valid train like
loco coal coal coal coal coach brake
The first coal truck is parsed by the leading SafeTruck in GoodsPart. The next two coal trucks must be parsed by the repetitive part { AnyTruck }, but you can probably see that the last coal truck would have to be parsed by the alternative within LastPart. Unfortunately an LL(1) parser can't see far enough ahead to make that decision, and would be tempted to treat this last coal truck as part of the { AnyTruck } sequence.
Here is one that is LL(1)
PRODUCTIONS Train3 = { OneTrain } EOF . OneTrain = LocoPart ( Passengers | FreightOrMixed ) "." . LocoPart = "loco" { "loco" } . FreightOrMixed = SafeTruck MoreTrucks HumanPart . MoreTrucks = { FuelTruck { FuelTruck } SafeTruck | SafeTruck } . HumanPart = "guard" | Passengers . Passengers = { "coach" } "brake" . SafeTruck = "coal" | "open" | "cattle" | "cold" . FuelTruck = "fuel" . END Train3.
where you might notice that we have used a production rather like that used in describing the sequence of tunes in the Medley competition in Task 4. At first you might thing that this is, at last, a correct solution. But no, it isn't quite. This solution does not allow you to have a train like
loco loco open fuel fuel guard .
as the last fuel truck in a sequence has to be followed by at least one safe truck. The grammar does, however, allow trains like
loco open coach coach brake .
with only one truck in the goods section.
It is remarkable that something that at first sight looks so simple should turn out to be frustratingly difficult. Not being able to find an LL(1) grammar is not a train smash - one quite often cannot find an LL(1) grammar for a language, and for some years the LL(1) solution to this problem eluded us all. But it's usually worth a try, as parsers for LL(1) grammars are so easy to write. Here is a solution that seems to meet all requirements.
PRODUCTIONS Train4 = { OneTrain } EOF . OneTrain = LocoPart [ SafeLoad | "guard" | Passengers ] "." . LocoPart = "loco" { "loco" } . Passengers = { "coach" } "brake" . SafeLoad = SafeTruck RestSafeLoad . RestSafeLoad = SafeLoad | "guard" | Passengers | Fuel . Fuel = FuelTruck { FuelTruck } ( SafeLoad | "guard" ) . SafeTruck = "coal" | "open" | "cattle" | "cold" . FuelTruck = "fuel" . END Train4.
I was encouraged to see that many groups solved this problem, or got very close to doing so.