Hand in your final solutions to this practical before lunch time on your next practical day, correctly packaged in a transparent folder with your cover sheets. Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker. Please make it clear whose folder you have used for the electronic submission, for example g15A1234. Lastly, please resist the temptation to carve up the practical, with each group member only doing one task. The group experience is best when you discuss each task together.
In this practical you are to
When you have completed this practical you should understand
This week you are required to hand in, besides the cover sheets (one per group member, please!):
Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.
You are referred to the rules for practical submission which are clearly stated in our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box outside the laboratory and not given to demonstrators.
A rule not stated there, but which should be obvious, is that you are not allowed to hand in another group's or student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so. You are expected to be familiar with the University Policy on Plagiarism, which you can consult on the Rhodes website.
WARNING. This exercise really requires you to do some real thinking and planning. Please do not just sit at a computer and hack away as most of you are wont to do. Sit in your groups and discuss your ideas with one another and with the demonstrators. If you don't do this you will probably find that the whole exercise turns into a nightmare, and I don't want that to happen.
Unpack the prac kit PRAC5.ZIP. In it you will find the skeleton of a system adapted for intermediate testing of a scanner (and to which you will later add a parser), and some simple test data files - but you really need to learn to develop your own test data.
Here is a grammar for most of our pseudo-Fortran subset:
COMPILER ToyFortran /* ToyFortran level 1 grammar - Coco/R for C# P.D. Terry, Rhodes University, 2017 */ IGNORECASE CHARACTERS lf = CHR(10) . control = CHR(0) .. CHR(31) . letter = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" . digit = "0123456789" . stringCh = ANY - "'" - control . printable = ANY - control . TOKENS identifier = letter { letter | digit } . number = digit { digit } . stringLit = "'" { stringCh } "'" . eol = lf . IGNORE CHR(0) .. CHR(9) + CHR(11) .. CHR(31) PRODUCTIONS ToyFortran = { eol } "PROGRAM" Ident EOLS { VarDeclarations } { GeneralStatement } "END" EOLS . EOLS = SYNC eol { eol } [ Label ] . VarDeclarations = ( "INTEGER" | "LOGICAL" ) OneVar { WEAK "," OneVar } EOLS . OneVar = Ident [ "(" IntConst ")" ] . GeneralStatement /* Some statements are not permitted within a logical IF statement */ = LimitedStatement | IfStatement | DoStatement . LimitedStatement /* These are all permitted as the subsidiary within a logical IF statement */ = SYNC ( Assignment | GoToStatement | ReadStatement | PrintStatement | "STOP" | "CONTINUE" ) EOLS . Assignment = Designator "=" Expression . Designator = Ident [ "(" Expression ")" ] . IfStatement = "IF" "(" Expression ")" ( Label "," Label "," Label EOLS | ( LimitedStatement | "THEN" EOLS { GeneralStatement } [ "ELSE" EOLS { GeneralStatement } ] "ENDIF" EOLS ) ) . DoStatement = "DO" Label [ "," ] Ident "=" Expression "," Expression [ "," Expression ] EOLS . GoToStatement = ( "GOTO" | "GO" "TO" ) ( Label | "(" Label { WEAK "," Label } ")" [ "," ] Expression ) . ReadStatement = "READ" "*" { WEAK "," ReadElement } . ReadElement = Designator . PrintStatement = ( "PRINT0" | "PRINT" ) "*" { WEAK "," PrintElement } . PrintElement = stringLit | Expression . Expression = AndExp { ".OR." AndExp } . AndExp = NotExp { ".AND." NotExp } . NotExp = [ ".NOT." ] RelExp . RelExp = AddExp [ RelOp AddExp ] . AddExp = [ AddOp ] MultExp { AddOp MultExp } . MultExp = Factor { MulOp Factor } . Factor = Designator | Constant | "(" Expression ")" . Constant = IntConst | ".TRUE." | ".FALSE." . AddOp = "+" | "-" . MulOp = "*" | "/" . RelOp = ".LT." | ".LE." | ".GT." | ".GE." | ".EQ." | ".NE." . Ident = identifier . IntConst = number . Label = number . END ToyFortran.
Tempting as it might be simply to use Coco/R to produce a program that will parse ToyFortran programs, this week we should like you to produce such a system more directly, by developing a program in the spirit of the one you will find in the textbook in chapter 8.2.
The essence of this program is that it will eventually have a main method that will
In this practical you are to develop such a scanner and parser, which you should try in easy stages. So for Task 2, study the grammar above and the skeleton program from the kit (ToyFortran.cs) as shown below. In particular, note how the character handler section has been programmed.
// Do learn to insert your names and a brief description of what the program is supposed to do! // This is a skeleton program for developing a parser for ToyFortran programs // P.D. Terry, Rhodes University, 2017 using Library; using System; using System.Text; class Token { public int kind; public string val; public Token(int kind, string val) { this.kind = kind; this.val = val; } // constructor } // Token class ToyFortran { // +++++++++++++++++++++++++ File Handling and Error handlers ++++++++++++++++++++ static InFile input; static OutFile output; static string NewFileName(string oldFileName, string ext) { // Creates new file name by changing extension of oldFileName to ext int i = oldFileName.LastIndexOf('.'); if (i < 0) return oldFileName + ext; else return oldFileName.Substring(0, i) + ext; } // NewFileName static void ReportError(string errorMessage) { // Displays errorMessage on standard output and on reflected output Console.WriteLine(errorMessage); output.WriteLine(errorMessage); } // ReportError static void Abort(string errorMessage) { // Abandons parsing after issuing error message ReportError(errorMessage); output.Close(); System.Environment.Exit(1); } // Abort // +++++++++++++++++++++++ token kinds enumeration +++++++++++++++++++++++++ const int noSym = 0, EOFSym = 1; // and others like this // +++++++++++++++++++++++++++++ Character Handler ++++++++++++++++++++++++++ const char EOF = '\0'; const char EOL = '\n'; static bool atEndOfFile = false; // Declaring ch as a global variable is done for expediency - global variables // are not always a good thing static char ch; // look ahead character for scanner static void GetChar() { // Obtains next character ch from input, or CHR(0) if EOF reached // Reflect ch to output if (atEndOfFile) ch = EOF; else { ch = input.ReadChar(); atEndOfFile = ch == EOF; if (!atEndOfFile) output.Write(ch); } } // GetChar // +++++++++++++++++++++++++++++++ Scanner ++++++++++++++++++++++++++++++++++ // Declaring sym as a global variable is done for expediency - global variables // are not always a good thing static Token sym; static void GetSym() { // Scans for next sym from input while (ch > EOF && ch <= ' ') GetChar(); StringBuilder symLex = new StringBuilder(); int symKind = noSym; // ++++++++++++++++++++++++++++++++++++++++ over to you! sym = new Token(symKind, symLex.ToString()); } // GetSym /* ++++ Commented out for the moment // +++++++++++++++++++++++++++++++ Parser +++++++++++++++++++++++++++++++++++ static void Accept(int wantedSym, string errorMessage) { // Gets next token if the current token matches wantedSym if (sym.kind == wantedSym) GetSym(); else Abort(errorMessage); } // Accept static void Accept(IntSet allowedSet, string errorMessage) { // Gets next token if the current token matches one of allowedSet if (allowedSet.Contains(sym.kind)) GetSym(); else Abort(errorMessage); } // Accept static void Program() { } ++++++ */ // +++++++++++++++++++++ Main driver function +++++++++++++++++++++++++++++++ public static void Main(string[] args) { // Open input and output files from command line arguments if (args.Length == 0) { Console.WriteLine("Usage: ToyFortran FileName"); System.Environment.Exit(1); } input = new InFile(args[0]); output = new OutFile(NewFileName(args[0], ".out")); GetChar(); // Lookahead character // To test the scanner we can use a loop like the following: do { GetSym(); // Lookahead symbol OutFile.StdOut.Write(sym.kind, 3); OutFile.StdOut.WriteLine(" " + sym.val); // See what we got } while (sym.kind != EOFSym); /* After the scanner is debugged we shall substitute the following for that loop: GetSym(); // Lookahead symbol Program(); // Start to parse from the goal symbol // if we get back here everything must have been satisfactory Console.WriteLine("Parsed correctly"); */ output.Close(); } // Main } // ToyFortran
Next, develop the scanner by completing the GetSym method, whose goal in life is to recognize tokens. Tokens for this application could be defined by an enumeration of the following, each defined by a unique number:
noSym, EOFSym, EOLSym, identSym, numSym, lparenSym, rparenSym, DOSym, IFSym ...
As you can see from the grammar given above, Fortran was a "line oriented" language. Unlike the languages you have used previously where the ends of lines are insignificant, you wmust for EOL as a distinctive token.
The scanner can (indeed, must) initially be developed on the pretext that an initial character ch has been read. When called, it must (if necessary) read past any "white space" in the input file until it comes to a character that can form part (or all) of a token. It must then read as many characters as are required to identify a token, and assign the corresponding value from the enumeration to the kind field of an object called, say, sym - and then read the next character ch (remember that the parsers we are discussing always look one position ahead in the source).
Test the scanner with a program derived from the skeleton, which should be able to scan the data file and simply tell you what tokens it can find, using the simple loop in the Main method as supplied. At this stage do not construct the parser, or attempt to deal with comments. A simple data file for testing can be found in the file test1.for, one that simply has a list of all tokens in test2.for, and a trickier one in test3.for and you can (and probably should) invent a few more for yourself.
You can compile your program by giving the command
csharp ToyFortran.cs
and can run it by giving a command like
ToyFortran test1.for or ToyFortran test2.for
Next, refine the scanner so that it can deal with (that is, safely ignore) comments in Toy Fortran programs. A suitable data file for testing is to be found in the file test4.for, another in test5.for.
Comments in Fortran were written with a C in column 1 of a line, as in the examples. Labels on statements (where needed) occurred in columns 1 through 5, and statements themselves were written, one to a line in columns 7 through 72. But allow your system to be free format, other than trying to match the requirement that comments start with a C in column 1 and that the ends of lines mark the ends of statements.
There were some other peculiarities that you might like to deal with. Fortran identifiers could have a maximum of 6 characters, and Fortran string literals were delimited by single quotes - if a single quote were needed within the string this was specified by two single quotes in succession as in 'That''s enough to blow a student''s mind'. Pascal used this convention too - see if you can handle it in your scanner. The file test6.for can act as a test case.
You cannot possibly expect to start on Task 5 until such time as the scanner is working properly, so test it thoroughly, please!
Task 5 is to develop the associated parser as a set of routines, one for each of the non-terminals suggested in the grammar above. These methods should, where necessary, simply call on the GetSym scanner routine to deliver the next token from the input. As discussed in chapter 8, the system hinges on the premise that each time a parsing routine is called (including the initial call to the goal routine) there will already be a token waiting in the variable sym, and whenever a parsing routine returns, it will have obtained the follower token in readiness for the caller to continue parsing (see discussion on page 103). It is to make communication between all these routines easy that we declare the lookahead character ch and the lookahead token sym to be fields "global" to the ToyFortran class.
Of course, anyone can write a recognizer for input that is correct. The clever bit is to be able to spot incorrect input, and to react by reporting an appropriate error message. For the purposes of this exercise it will be sufficient to make use of the accept routines that you see in the supplied code, ones that simply issue a stern error message, close the output file, and then abandons parsing altogether.
Something to think about: If you have been following the lectures, you will know that associated with each nonterminal A is a set FIRST(A) of the terminals that can appear first in any string derived from A. Alarums and excursions (as they say in the classics). So that's why we learned to use the IntSet class in practical 1!
You are reminded of the web pages like
http://www.cs.ru.ac.za/courses/CSc301/Translators/sets.htm http://www.cs.ru.ac.za/courses/CSc301/Translators/inout.htm
where you will find useful summaries specifications of library routines especially developed for this course.
To test your parser you might like to make use of the many toy programs supplied in the prac kit. Some of these have deliberate errors, so be careful. Parsing the incorrect ones will be a little more frustrating unless you add syntax error recovery (not required, but feel free to experiment), as the parser will simply stop as soon as it finds the first error. You might like to create a number of "one-liner" data files to make this testing stage easier. Above all, do test your program thoroughly. Any parser for a real compiler that has bugs is almost useless!
C: Sieve2.for C: Sieve of Eratosthenes for finding primes 2 <= n <= 400 (Toy Fortran version) C: P.D. Terry, Rhodes University, 2017 program sieve2 logical uncrossed(4000) C: counters integer max, i, n, k, it, iterations, primes logical display max = 4000 primes = 0 print0 *, 'How many iterations?' read *, iterations display = iterations .eq. 1 print0 *, 'Supply largest number to be tested ' read *, n if (n .le. max) goto 10 print *, 'That''s too large, sorry' stop 10 do 90 it = 1, iterations primes = 0 print *, 'Prime numbers between 2 and ' , n print *, '-----------------------------------' i = 2 C: clear sieve do 30 i = 2, n 30 uncrossed(i) = .true. i = 2 C: the passes over the sieve 50 if (i .gt. n) goto 90 if (.not. uncrossed(i)) goto 70 primes = primes + 1 if (display) then print0 *, i if (primes .eq. (primes / 8) * 8) print * endif C: now cross out multiples of i k = i 60 uncrossed(k) = .false. k = k + i if (k .le. n) goto 60 70 i = i + 1 goto 50 90 continue print * print *, primes, ' primes' stop end