Computer Science 3 - 2001

Programming Language Translation


Practical for Week 17, beginning 13 August 2001

Hand in this prac sheet before lunch time on your next practical day, correctly packaged in a transparent folder with your solutions and the "cover sheet". Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.


Objectives

The objectives of this practical are to familiarize you with simple applications of the Coco/R parser generator, and to give you practice in writing grammars that describe simple language features.


To hand in:

This week you are required to hand in, besides the cover sheet:

I do NOT require listings of any C++ code produced by Coco/R.

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 13 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box in the secretary's office for collection by Pat Terry.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so.


Task 1 - a trivial task

Unpack the prac kit PRAC17.ZIP.

In this file will be found a whole host of goodies. Among them are:


Task 2 - Simple use of Coco/R - a quick task

In the kit you will find CALC.ATG. This is essentially the calculator grammar from section 5.9.5 of the text, with a slight (cosmetic) change of name.

Use Coco/R to generate a parser for data for this calculator. You do this most simply by giving the command

CR CALC.ATG

Used like this, Coco/R will simply generate three important components of a calculator program - the parser, scanner, and main driver program. Cocol specifications can be programmed to generate a complete calculator too (ie one that will evaluate the expressions, rather than simply check them for syntactic correctness), but that will have to wait for the early hours of another day.

(Wow! Have you ever written a program so fast in your life before?)

Of course, having Coco/R write you a program is one thing. But it might also be fun and interesting to compile the program and see that it works. How you do this depends on what system you want to use.

For C++ systems you can keep firing up those awful environments, creating project files, specify all sorts of other paths and options, and generally have lots of fun getting lost, but for our purposes the best idea is to make use of the "make" system.

MAKE is a utility that allows one to specify various component parts of a large system and how they are interrelated, and to recompile and relink the system easily when part of it is updated. MAKE takes its parameters from a so-called "makefile" - if you study these you will see that a fairly understandable pattern emerges. However, generating makefiles is sometimes awkward, so I have developed a program, included in the C++ kit, that will generate you a makefile for Coco/R projects, suitable for using either Borland C++ or Visual C++, as you prefer.

Assuming that your grammar has a start symbol called XXXX, and that the file specifying the grammar is called XXXX.ATG, issuing the command

GENMAKE XXXX BORL31 (for Borland C++ 3.1)

or

GENMAKE XXXX BORL55 (for Borland C++ 5.5)

or

GENMAKE XXXX VISUAL (for Microsoft Visual C++)

will generate a file XXXX.MAK that you can then apply as a parameter to the MAKE command (for Borland C++) or as a parameter to the NMAKE command (for Visual C++). After generating the makefile (which you need do only once for each task), a command like

MAKE -f XXXX.MAK (for Borland C++ 3.1)

or

NMAKE -f XXXX.MAK (for Microsoft Visual C++)

will (a) generate the system from the grammar (b) compile the parts of the system and (c) link it all together, whereafter a command like

XXXX DATA.TXT

will run the program XXXX.EXE and try to parse the file DATA.TXT, sending error messages to the screen. Giving the command in the form

XXXX DATA.TXT /L

will send an error listing to the file DATA.LST, which might be more convenient. Try this out on the calculator:

       GENMAKE CALC BORL55                GENMAKE CALC VISUAL 
       MAKE -f CALC               or      NMAKE -f CALC 
       CALC DATA.TXT                      CALC DATA.TXT 
       CALC DATA.BAD                      CALC DATA.BAD 

The makefile also allows you some other options. Specifically

MAKE -f CALC GENERATE will simply run the Coco/R phase for CALC
MAKE -f CALC CLEAN will simply delete the OBJ and EXE files for CALC
MAKE -f CALC NEW will cleanup and remake the CALC system from scratch

Well, you did all that. Well done. What next?

For some light relief and interest you might like to look at the code the system generated for you - you don't have to comment this week, simply gaze in awe. Don't take too long over this, because now you have the chance to be more creative!

Wait - we have not finished yet. Modify the grammar so that you can use parentheses in your expressions, and also give the calculator a square-root capability.

Well, of course, it does not have any real "calculator" capability - it cannot calculate anything (yet). It only has the ability to recognise or reject expressions at this stage. Try it out with some expressions that use the new features, and some that use them incorrectly.

Warning. Language design and grammar design is easy to get wrong. Think hard about these problems before you begin, and while you are doing them.


Task 3 - A stack-assembler parser

In the prac kit you will also find EG1.STK, EG2.STK and EG3.STK - simple programs written in the simple ASSEMBLER language for the simple stack machine that terrified you last week. There are a few documented errors, however, just for fun.

Write a simple grammar in Cocol - say STK.ATG - that describes this stack assembler language, and produce a parser for it using Coco/R. Try out your parser on the specimen data files, checking that it detects the errors, and that it will accept the programs when their errors are corrected.

Note that you do not have to produce a full assembler, interpreter, or anything daring like that - simply the parser. And all you have to submit for assessment is the STK.ATG file.


Task 4 - Spoornet are looking for programmers

Derive a grammar to describe a mixed passenger and goods train, with one (or more) locomotives, then one or more goods trucks, followed either by a guard's van, or by one or more passenger coaches, the last of which should be a passenger brake van. In the interests of safety, try to build in a regulation to the effect that fuel trucks may not be marshalled immediately behind the locomotives, or immediately in front of a passenger coach.

Some sample trains may be found in the file TRAINS in the kit. See if you can generate a parser that will either accept or reject these trains.


Task 5 - One for the Musicians in our Midst (but the rest of you should do it too)

Now that you know all about Ceol Mor and how it was handed on (quickly, what was that strange word for the notation used?), you will be intrigued to learn that Scottish pipe bands often compete at events called Highland Gatherings where three forms of competition are traditionally mounted. There is the so-called "Slow into Quick March" competition, in which each band plays a single Slow March followed by a single Quick March. There is the so-called "March, Strathspey and Reel" competition, where each band plays a single Quick March, followed by a single Strathspey, and then by a single Reel; this set may optionally be followed by a further Quick March. And there is also the "Medley", in which a band plays a selection of tunes in almost any order. Each tune falls into one of the categories of March, Strathspey, Reel, Slow March, Jig and Hornpipe but, by tradition, a group of one or more Strathspeys within such a medley is always followed by a group of one or more Reels.

Develop a grammar to describe the activity at a Highland Gathering at which a number of competitions are held, and in each of which at least one band performs. Competitions are held in one category at a time, with a short break between competitors, and between events. Regard concepts like "March", "Reel" and so on as terminals - in fact there are many different possible tunes of each sort, of course, but you may have to be a piper to recognize one tune from another!

(Incidentally, the classical music of the Highland Pipe is almost never played by bands. It is a very refined form of solo music; admittedly an acquired taste.)


Task 6 - So what if Clang is so restrictive - fix it!

In the prac kit you will find the grammar for Clang taken from section 8.7.2 of the text. Generate a program from this that will recognise or reject simple Clang programs, and verify that the program behaves correctly with the two sample programs in the kit, namely VOTE.CLN and BAD.CLN.

GENMAKE CLANG BORL55
MAKE -f CLANG.MAK
CLANG VOTE.CLN

and

CLANG BAD.CLN

In an earlier practical students complined that Clang did not have AND and OR operators. Very well. Extend the grammar so that these become part of the vocabulary. Be careful: operators in expressions have relative precedences, and you will need to take these into account. If you came to this week's tutorial you should be in a good position to forge right ahead; if you did not ...

Clang really is a horrid little language, isn't it? But its simplicity means that it is easy to devise Terry Torture on the lines of "extend it". While you are at it, extend the definition of arrays to allow a named constant as an alternative to the restrictive numeric literal, and add a CASE or SWITCH statement to the language. Here's an example of such a statement that should give you some ideas:

         CASE  a+b  OF
           1 : C := d; e := f;
           2 : IF x > y THEN e := g;
               WHILE i < 10 DO i := i + 1
           3, 4, 5 :  (* do nothing *)
           DEFAULT : Read(x); Write(2*x)
         END

Note: Read that phrase again: "which should give you some ideas". And again. And again. Don't just rush in and write a grammar that will recognise only that particular statement. Think hard about what sorts of things you can see there, and think hard about how you could make your grammar fairly general.

Hint: When you get this right - when you have thought hard about it instead of just hacking for N hours - you will find that the solution is all of about four lines long! Terry's First Law of Computer Science states that beneath every over-complicated program there is a clear, simple, efficient program trying to get out.


Appendix: Practical considerations when using Coco/R

Firstly, note that the QEdit editor system has been configured for you so that you can easily edit and check Cocol files, provided that these have the standard suffix .ATG. Thus a recommended way to proceed is as follows:

Start the editor

Q GRAMM.ATG

and then edit your grammar in the usual way.

From within the QEdit editor CTRL-F9 or CTRL-F10 will invoke the Coco/R compiler CR.EXE. If there are any errors, you will be able to move from one error location to the next by using the F4 key (the reason for the error is given in the lower window), or by using ALT-1 through ALT-9 to move to the 1st through 9th errors.

You can, of course, prepare ATG files with other ASCII editors. UltraEdit has also been configured to link to Coco/R by using options in the "advanced" pull down menu - one invokes Coco/R, and another invokes Make (to use the latter you should rename your makefile as MAKEFILE). Finally, you can also run Coco/R free standing with a command like:

CR XXXX.ATG

in which case the system will produce you a listing of the grammar file and the associated error messages, if any, in a corresponding file XXXX.LST.

For ease of use with the makefile concept and with various DOS file handling conventions, please

Since you will be using C++, make sure that the grammar includes the "pragma" $X (which generates C++ code, rather than C code).

The first line of your grammar description should thus always read something like

COMPILER Name $NCX /* Standard pragmas for Pascal, Modula or C++ */

Error checking by Coco/R takes place in various stages. The first of these relates to simple syntactic errors - like leaving off a period at the end of a production. These are usually easily fixed. The second stage consists of ensuring that all non-terminals have been defined with right hand sides, that all non-terminals are "reachable", that there are no cyclic productions, no useless productions, and in particular that the productions satisfy what are known as LL(1) constraints.

We shall discuss LL(1) constraints in class in the next week, and so for this prac we shall simply hope that they do not become tiresome. The most common way of violating the LL(1) constraints is to have alternatives for a nonterminal that start with the same piece of string. This means that a so-called LL(1) parser (which is what Coco/R generates for you) cannot easily decide which alternative to take - and in fact will run the risk of going badly astray. Here is an example of a rule that violates the LL(1) constraints:

        assignment =   variableName ":=" expression 
                     | variableName index ":=" expression. 
        index      =   "[" subscript "]" . 

Both alternatives for assignment start with a variableName. However, we can easily write production rules that does not have this snag:

        assignment =   variableName [ index ] ":=" expression .
        index      =   "[" subscript "]" . 

A moment's thought will show that the various expression grammars that we have discussed in class - the left recursive rules like

        expression = term | expression "-" term . 

also violate the LL(1) constraints, and so have to be recast (see the CLANG.ATG file) as

        expression = term { "-" term } . 

to get around the problem.

For the moment, if you encounter LL(1) problems, please speak to the long suffering demonstrators who will (hopefully) be able to help you resolve them.


Home  © P.D. Terry