The following Cocol grammar may be familiar. It describes a set of EBNF productions that can incorporate Wirth's metabrackets { } [ and ].
COMPILER EBNF $CN /* Parse a set of EBNF productions P.D. Terry, Rhodes University, 2005 */ CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . digit = "0123456789" . noquote1 = ANY - "'" . noquote2 = ANY - '"' . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' . COMMENTS FROM "<" TO ">" IGNORE CHR(9) .. CHR(13) PRODUCTIONS EBNF = { Production } EOF . Production = nonterminal "=" Expression "." . Expression = Term { "|" Term } . Term = Factor { Factor } . Factor = nonterminal | terminal | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" . END EBNF.
Show how the productions would be attributed so as to parse a set of productions given in the Wirth notation and reproduce them one to a line in the alternative EBNF notation that uses Kleene closure symbol * and e. For example, a production like
Program = [ Header ] { Statement } .
should be transformed to
Program = ( Header | e ) ( Statement )* .
You can find a Java "kit" for the following solutions on the course web page as the file tut28.zip.
Conversion of the productions is very easy; one simply writes the tokens as they are parsed, interspersed with blank characters, and substituting one form of bracketing for another. A simple solution simply ignores comments:
import Library.*; COMPILER EBNF1 $CN /* Parse a set of EBNF productions and convert to one form of EBNF - This version simply ignores comments P.D. Terry, Rhodes University, 2005 */ CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . digit = "0123456789" . noquote1 = ANY - "'" . noquote2 = ANY - '"' . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' . COMMENTS FROM "<" TO ">" IGNORE CHR(9) .. CHR(13) PRODUCTIONS EBNF1 = { Production } EOF . Production = nonterminal (. IO.write(token.val + " "); .) "=" (. IO.write("= "); .) Expression "." (. IO.writeLine(". "); .) . Expression = Term { "|" (. IO.write("| "); .) Term } . Term = Factor { Factor } . Factor = nonterminal (. IO.write(token.val + " "); .) | terminal (. IO.write(token.val + " "); .) | "[" (. IO.write("( "); .) Expression "]" (. IO.write("| eps ) "); .) | "(" (. IO.write("( "); .) Expression ")" (. IO.write(") "); .) | "{" (. IO.write("( "); .) Expression "}" (. IO.write(")* "); .) . END EBNF1.
To be able to retain the < comments in a set of productions > one has to resort to the idea of declaring a suitable token pattern for them as a pragma (see page 250). A similar trick can be used to retain line breaks.
At first this seems very simple, leading to the following code. Note that the text of a pragma is stored in the "look ahead token" (la.val), not the most recent token (token.val).
import Library.*; COMPILER EBNF2 $CN /* Parse a set of EBNF productions and convert to another form of EBNF using the Kleene star and explicit eps - This version handles comments and line ends, but writes them in the wrong places! P.D. Terry, Rhodes University, 2005 */ CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . digit = "0123456789" . noquote1 = ANY - "'" . noquote2 = ANY - '"' . inComment = ANY - ">" . eol = CHR(10) . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' . PRAGMAS Comments = "<" { inComment } ">" . (. IO.write(" " + la.val + " "); .) EOL = eol . (. IO.writeLine(); .) IGNORE CHR(0) .. CHR(31) - CHR(10) PRODUCTIONS EBNF2 = { Production } EOF . Production = nonterminal (. IO.write(token.val, -14); .) "=" (. IO.write(" = "); .) Expression "." (. IO.writeLine("."); .) . Expression = Term { "|" (. IO.write("| "); .) Term } . Term = Factor { Factor } . Factor = nonterminal (. IO.write(token.val + " "); .) | terminal (. IO.write(token.val + " "); .) | "[" (. IO.write("( "); .) Expression "]" (. IO.write("| eps ) "); .) | "(" (. IO.write("( "); .) Expression ")" (. IO.write(") "); .) | "{" (. IO.write("( "); .) Expression "}" (. IO.write(")* "); .) . END EBNF2.
However, the "look ahead" property requires a more subtle trick - we need to delay writing the pragma until after the next token has been parsed. The trick is to accumulate the pragma text into a string and redirect all output operations through a simple method that will output the token followed by the pragma!
import Library.*; COMPILER EBNF3 $CN /* Parse a set of EBNF productions and convert to another form of EBNF using the Kleene star and explicit eps - This version handles comments and line ends, adds them to a pragma string so that they can be written in the correct places P.D. Terry, Rhodes University, 2005 */ static String pragmaStr = ""; /* starts empty */ static void write(String str) { IO.write(str + pragmaStr); pragmaStr = ""; /* reset the pragma string */ } CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . digit = "0123456789" . noquote1 = ANY - "'" . noquote2 = ANY - '"' . inComment = ANY - ">" . eol = CHR(10) . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = "'" noquote1 { noquote1 } "'" | '"' noquote2 { noquote2 } '"' . PRAGMAS Comments = "<" { inComment } ">" . (. pragmaStr += " " + la.val + " "; .) EOL = eol . (. pragmaStr += "\r\n"; .) IGNORE CHR(0) .. CHR(31) - CHR(10) PRODUCTIONS EBNF3 = { Production } EOF . Production = nonterminal (. write(token.val); .) "=" (. write(" = "); .) Expression "." (. write(".\r\n"); .) . Expression = Term { "|" (. write("| "); .) Term } . Term = Factor { Factor } . Factor = nonterminal (. write(token.val + " "); .) | terminal (. write(token.val + " "); .) | "[" (. write("( "); .) Expression "]" (. write("| eps ) "); .) | "(" (. write("( "); .) Expression ")" (. write(") "); .) | "{" (. write("( "); .) Expression "}" (. write(")* "); .) . END EBNF3.