Hand in your solutions to the second part of this practical before lunch time on your next practical day, correctly packaged in a transparent folder with your cover sheets. Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker. Please make it clear whose folder you have used for the electronic submission, for example g15A1234. Lastly, please resist the temptation to carve up the practical, with each group member only doing one task. The group experience is best when you discuss each task together.
In this practical you are to
You will need this prac sheet and your text book. As usual, copies of the prac sheet are also available at http://www.cs.ru.ac.za/courses/CSc301/Translators/trans.htm.
When you have completed this practical you should understand
This week you are required to hand in, besides the cover sheet:
I do NOT require listings of any C# code produced by Coco/R.
Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.
You are referred to the rules for practical submission which are clearly stated in our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box outside the laboratory and not given to demonstrators.
A rule not stated there, but which should be obvious, is that you are not allowed to hand in another group's or student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so. You are expected to be familiar with the University Policy on Plagiarism, which you can consult on the University web site.
The first few tasks do not need you to use a computer, nor should you. Do them by hand.
The following Cocol grammar describes (in EBNF) a set of productions written in EBNF. For example, it could be supplied to Coco/R to produce a system that could parse a set of productions similar to the ones discussed in class (and given below the Cocol description).
COMPILER EBNF0 $CN /* Parse a set of EBNF productions - grammar only P.D. Terry, Rhodes University, 2017 */ CHARACTERS letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . lowline = "_" . digit = "0123456789" . noquote = ANY - '"' . TOKENS nonterminal = letter { letter | lowline | digit } . terminal = '"' noquote { noquote } '"' . COMMENTS FROM "(*" TO "*)" NESTED IGNORE CHR(9) .. CHR(13) PRODUCTIONS EBNF0 = { Production } EOF . Production = nonterminal "=" Expression "." . Expression = Term { "|" Term } . Term = Factor { Factor } . Factor = nonterminal | terminal | "[" Expression "]" | "(" Expression ")" | "{" Expression "}" . END EBNF0.
(a) Do the production in the EBNF grammar obey the LL(1) constraints? Justify your answer.
(b) The nonterminal definition allows a non-terminal to end with a lowline - as in my_name__. How should the definition be changed to allow lowlines within a non-terminal, but not at the end?
c) Now consider a set of productions specifying simple English sentences:
Sentence = "the" QualifiedNoun Verb | Pronoun Verb . QualifiedNoun = [ Adjective { Adjective } ] Noun . Noun = "man" | "girl" | "boy" | "lecturer" . Pronoun = "he" | "she" . Verb = "talks" | "listens" | "mystifies" . Adjective = "tall" | "thin" | "sleepy" .
The productions leading on from Sentence allow a QualifiedNoun to consist of a Noun preceded by zero or more Adjectives. But in place of the line reading
QualifiedNoun = [ Adjective { Adjective } ] Noun .
one might have been tempted to write
QualifiedNoun = [ { Adjective } Adjective ] Noun .
or even
QualifiedNoun = { Adjective } Noun .
Do any of these suggestions introduce either ambiguity or non-LL(1) compliance into the grammar for Sentence? If they do, does this mean that the EBNF parser that could be constructed by Coco/R from the Cocol code given earlier would be unable to recognise the productions defining a Sentence? Does it mean that if Coco/R was used to try to construct a parser for sentences using one or other of these suggestions for QualifiedNoun it would be unable to do so? Discuss.
The following Cocol grammar attempts to describe a sequence of Roman numbers between 1 and 10 - as you probably know, these were represented by I, II, III, IV, V, VI, VII, VIII, IX and X.
COMPILER Roman /* Parse a list of Roman numbers in the range 1 ... 10 - grammar only P.D. Terry, Rhodes University, 2017 */ IGNORE CHR(9) .. CHR(13) PRODUCTIONS Roman = OneNumber { "," OneNumber } "." . OneNumber = StartI | StartV | StartX . StartI = "I" [ [ "I" ] [ "I" ] | "V" | "X" ] . StartV = "V" [ "I" ] [ "I" ] [ "I" ] . StartX = "X" . END Roman.
Is this an LL(1) grammar? If not, why not - and if it is, make a convincing argument for it. Is it an ambiguous grammar? If so, demonstrate an ambiguity, and if not, convince a non-believer.
Palindromes are character strings that read the same from either end, like "Hannah" or my brother Peter's favourite line when he did the CTM advertisements: "Bob Bob". The following represent various attempts to find grammars that describe palindromes made only of the letters a and b:
(1) Palindrome = "a" Palindrome "a" | "b" Palindrome "b" . (2) Palindrome = "a" Palindrome "a" | "b" Palindrome "b" | "a" | "b" . (3) Palindrome = "a" [ Palindrome ] "a" | "b" [ Palindrome ] "b" . (4) Palindrome = [ "a" Palindrome "a" | "b" Palindrome "b" | "a" | "b" ] .
Which grammars achieve their aim? If they do not, explain why not. Which of them are LL(1)? Can you find other (perhaps better) grammars that describe palindromes and which are LL(1)?
Which of the following statements are true? Justify your answer.
(a) An LL(1) grammar cannot be ambiguous.
(b) A non-LL(1) grammar must be ambiguous.
(c) An ambiguous language cannot be described by an LL(1) grammar.
(d) It is possible to find an LL(1) grammar to describe any non-ambiguous language.
Hand in your solutions to tasks 1 through 4 before continuing.
There are several files that you need for the rest of the practical, zipped up this week in the file PRAC4.ZIP
Copy the prac kit into a newly created directory/folder in your file space in the usual way:
j: md prac4 cd prac4 copy i:\csc301\trans\prac4.zip unzip prac4.zip
You will find the executable version of Coco/R and batch files for running it, frame files, and various sample programs and grammars, including ones for the grammars given in tasks 1, 2 and 3.
After unpacking this kit attempt to make the parsers, as you did last week. Ask the demonstrators to show you how to get Coco/R to show you the FIRST and FOLLOW sets for the non-terminals of the grammar, and verify that the objections (if any) that Coco/R raises to these grammars are the same as you have determined by hand.
As an example of checking the LL(1) conditions for a grammar expressed in EBNF, let us consider how we might describe the daily activities of SAFM - "your news and information leader" (more realistically described as a non- stop party political broadcast). Hour by hour Sakina, Tsepiso, Nancy, Ashraf, Elvis, Bongi and their comrades present what are generously called shows. These include music and advertisements (as people pay no licence fees, the station has to raise revenue through advertising). What passes as "information" is provided by news bulletins, weather reports and talk shows. News bulletins consist of a homogeneous stream of stories, while talk shows consist of interchanges between the host and listeners who are keen enough to phone in - with the person hosting the exchange getting the first and last word on the subject, of course.
Items like "advert", "rain" and "song" are really in the category of the lexical terminals which a scanner (in the person of someone listening to the radio) would recognize as key symbols while parsing a broadcast. So one playful attempt to describe a day's activities might be on the lines of
Radio = { TalkShow | NewsBulletin | "song" | "advert" } EOF . NewsBulletin = "advert" NewsItem { NewsItem } [ Weather ] Filler . NewsItem = "ANC" [ "SACP" ] | "protest" | [ "SACP" ] "ANC" | "zuma" | "Helen" "tweet" | Scandal | Accident . TalkShow = "host" { listener "host" } [ Filler ] . Accident = "collision" "claims" "another" number "lives" . Scandal = { "GuptaLeak" } "corruption" | "Hlaudi" . Filler = "song" | "advert" . Weather = { "snow" | "rain" | "cloudy" | "windy" } .
Analyze this grammar in detail. Treat listener and number as terminals. One way is to begin by recasting it in a form that introduces further non-terminals that allow for the elimination of the [ ] and { } meta-brackets, but you can probably do the analysis along the lines of the methods suggested on pages 95 - 96. If it proves out to be non-LL(1), try to find an essentially equivalent description that is LL(1), or argue why this should be impossible.
The weather bulletins could be improved if one was informed of the town to which the prediction applied. Suggest how the grammar would need modification to do this. As with listener and number, regard a town name as a terminal.
Develop a Cocol grammar PVMASM,ATG that describes programs written in the PVM code that you struggled with in Practical 2. That should be pretty easy but be careful to describe the language as tightly as possible. Then, just to make the language more attractive, suppose you had been able to use optional alphanumeric labels in your code as the destinations of branching instructions - as exemplified below. Modify the grammar to handle this.
DSP 2 LOOP LDA 1 INPI LDL 1 BZE EXIT BRN LOOP EXIT LDL 1 PRNI BRN 2 HALT
Of course your parser-only system won't actually be able to assemble the code or generate PVM code for the interpreter. In a few weeks time we might extend it further to do that - for the moment simply describe the language.
Wait! we are not finished yet. You might have noticed that the assembler you used a few weeks ago also generated a .COD file which, for the example above, would have looked like this
ASSEM BEGIN { 0 } DSP 2 { 2 } LDA 1 { 4 } INPI { 5 } LDL 1 { 7 } BZE 11 { 9 } BRN 2 { 11 } LDL 1 { 13 } PRNI { 14 } BRN 2 { 16 } HALT END.
What you might not have noticed was that this .COD file could also be assembled by the assembler - the addresses in { braces } were treated as comments. Now that you have been told this, ensure that the language you describe can handle this feature as well.
Develop a Cocol grammar BOOL.ATG for describing a list of Boolean expressions like those you should remember from a Boolean Algebra course. In this context, allow your Boolean expressions to be terminated with an = operator, and to include parentheses, single-letter variables, the constants FALSE and TRUE (sometimes written as 0 and 1), and the operators AND (written either as AND or "&" or as a "dot", or simply omitted), OR (written either as OR or as "|" or as a "plus") and NOT, written either as a prefix NOT or as a suffix apostrophe. So some examples of Boolean expressions might be (these are in the file BOOL.TXT)
a AND B OR (C OR NOT D) = a . b + (c + d') = a b + (c + d') = NOT (a OR b) AND TRUE = (a + b)' . 1 = (a + b)' AND 1 = b AND NOT C AND D = b . c' . d = b c' d =
Note that there is a precedence ordering among Boolean operators - parentheses take precedence over NOT, which takes precedence over AND, which takes precedence over OR.
Test your parser out on some examples like those just shown.
And here is something to think about. Draw out the parse tree that would correspond to the expression NOT (A OR B). Then draw out the parse tree that would correspond to NOT A AND NOT B. De Morgan will tell you that these expressions mean the same thing. Do you get the same tree in each case? Discuss the implications.
Hand in sheet - Available on the web in the ASCII file HANDIN.TXT for you to copy and edit, adding your answers. Task 1 - Self-describing ENBF Does the grammar as supplied obey the LL(1) constraints? Why/Why not? How do you define a terminal to allow internal lowlines (underscores) but not allow one or more at the end? Comment on whether the suggested changes to the productions for QualifiedNoun introduce non-LL(1) conformance or ambiguity. Task 2 - Ambiguity Is the grammar for Roman number sqquences an LL(1) grammar? Justify your answer. Is the suggested grammar ambiguous? Justify your answer. If the grammar is unsuitable according to the answers to the first questions, find a better grammar that does not suffer from the problems you have identified? Task 3 - Palindromes Does grammar 1 describe palindromes? If not, why not? Is it an LL(1) grammar? If not, why not? Does grammar 2 describe palindromes? If not, why not? Is it an LL(1) grammar? If not, why not? Does grammar 3 describe palindromes? If not, why not? Is it an LL(1) grammar? If not, why not? Does grammar 4 describe palindromes? If not, why not? Is it an LL(1) grammar? If not, why not? Can you find a better grammar to describe palindromes? If so, give it, if not, explain why not. Task 4 Which of the following statements are true? Justify your answers. (a) An LL(1) grammar cannot be ambiguous. (b) A non-LL(1) grammar must be ambiguous. (c) An ambiguous language cannot be described by an LL(1) grammar. (d) It is possible to find an LL(1) grammar to describe any non-ambiguous language.
Home © P.D. Terry