Computer Science 3 - 2000

Programming Language Translation


Practical for Week 22, beginning 25 September 2000

Hand in this prac sheet on your next practical day, correctly packaged in a transparent folder and with your solutions. This prac sheet forms the "cover sheet". Since the practical will have been done on a group basis, please hand in one copy of the prac sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.

Your surname: (PRINT CLEARLY)          Your prac day:

Your student number: (PRINT CLEARLY)

Your signature:                        Your group partners:


Other people with whom you collaborated (you need not give tutors' names):



Mark awarded:                          Tutor's signature


Objectives

The objectives of this practical are to familiarize you with simple applications of the Coco/R parser generator, and to give you practice in writing grammars that describe simple language features.

You will need this prac sheet and your text book.

Copies of this prac sheet and of the Clang report are also available on the web site for the course.


To hand in:

This week you are required to hand in, besides this cover sheet:

I do NOT require listings of any C++ or Pascal code produced by Coco/R.

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 10 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box in the secretary's office for collection by Pat Terry.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so.


Task 1 - a trivial task

Unpack the prac kit PRAC22C.ZIP (C++ fans) or PRAC22P.ZIP (Pascal fans).

In these files will be found a whole host of goodies. Among them are:


Task 2 - Simple use of Coco/R - a quick task

In the kit you will find CALC.ATG. This is essentially the calculator grammar from page 62 of the textbook, with a slight (cosmetic) change of name.

Use Coco/R to generate a parser for data for this calculator. You do this most simply by giving the command

CR CALC.ATG

Used like this, Coco/R will simply generate three important components of a calculator program - the parser, scanner, and main driver program. Cocol specifications can be programmed to generate a complete calculator too (ie one that will evaluate the expressions, rather than simply check them for syntactic correctness), but that will have to wait for the early hours of another day.

(Wow! Have you ever written a program so fast in your life before?)

Of course, having Coco/R write you a program is one thing. But it might also be fun and interesting to compile the program and see that it works. How you do this depends on what system you want to use.

Pascal system

This is the easier to use, because of the simplicity of the Pascal libraries.

Assuming that your grammar has a start symbol called XXXX, and that the file specifying the grammar is called XXXX.ATG, issuing the commands

CR XXXX
TPC /M XXXX
XXXX DATA.TXT

will generate the system (CR), compile and link it (TPC) and then run it (XXXX).

C++ systems

For C++ systems you can keep firing up those awful environments, creating project files, specify all sorts of other paths and options, and generally have lots of fun getting lost, but for our purposes the best idea is to make use of the "make" system.

MAKE is a utility that allows one to specify various component parts of a large system and how they are interrelated, and to recompile and relink the system easily when part of it is updated. MAKE takes its parameters from a so-called "makefile" - if you study these you will see that a fairly understandable pattern emerges. However, generating makefiles is sometimes awkward, so I have developed a program, included in the C++ kit, that will generate you a makefile for Coco/R projects, suitable for using either Borland C++ or Visual C++, as you prefer.

Assuming that your grammar has a start symbol called XXXX, and that the file specifying the grammar is called XXXX.ATG, issuing the command

GENMAKE XXXX BORLAND

or

GENMAKE XXXX VISUAL

will generate a file XXXX.MAK that you can then apply as a parameter to the MAKE command (for Borland C++) or as a parameter to the NMAKE command (for Visual C++). After generating the makefile (which you need do only once for each task), a command like

MAKE -f XXXX.MAK

or

NMAKE -f XXXX.MAK

will (a) generate the system from the grammar (b) compile the parts of the system and (c) link it all together, whereafter a command like

XXXX DATA.TXT

will run the program XXXX.EXE and try to parse the file DATA.TXT, sending error messages to the screen. Giving the command in the form

XXXX DATA.TXT /L

will send an error listing to the file DATA.LST, which might be more convenient. Try this out on the calculator:

       GENMAKE CALC BORLAND               GENMAKE CALC VISUAL 
       MAKE -f CALC               or      NMAKE -f CALC 
       CALC DATA.TXT                      CALC DATA.TXT 
       CALC DATA.BAD                      CALC DATA.BAD 

The makefile also allows you some other options. Specifically

MAKE -f CALC GENERATE will simply run the Coco/R phase for CALC
MAKE -f CALC CLEAN will simply delete the OBJ and EXE files for CALC
MAKE -f CALC NEW will cleanup and remake the CALC system from scratch

Well, you did all that. Well done. What next?

For some light relief and interest you might like to look at the code the system generated for you - you don't have to comment this week, simply gaze in awe. Don't take too long over this, because now you have the chance to be more creative!

Wait - we have not finished yet. Modify the grammar so that you can use parentheses and leading signs in your expressions, and also give the calculator a factorial function as in -4! + (3 - 4) .

Well, of course, it does not have any real "calculator" capability - it cannot calculate anything (yet). It only has the ability to recognise or reject expressions at this stage. Try it out with some expressions that use the new features, and some that use them incorrectly.

Warning. Language design and grammar design is easy to get wrong. Think hard about these problems before you begin, and while you are doing them.


Task 3 - Using the Cocol form of EBNF to describe the traditional form of BNF

The file EBNF.ATG contains a Cocol grammar that describes EBNF using EBNF conventions. Try this out "as is" to begin with:

      GENMAKE EBNF BORLAND (C++)      CR EBNF.ATG    (Pascal) 
      MAKE -f EBNF.MAK                TPC /M EBNF 
      EBNF EBNF.TXT                   EBNF EBNF.TXT 
      EBNF EBNF.BAD                   EBNF EBNF.BAD 

Next, use the grammar as a guide to develop a new system that will recognise or reject a set of productions written one to a line in "traditional" BNF notation, like those below (these examples can be found in the file BNF.TXT; note that you can obtain the e character by using ALT-238 on the numeric keypad).

      <name>               ::= <title> <first part> <surname>
      <title>              ::= Mr | Miss | Ms | Dr | Prof | e
      <first part>         ::= <name> | <first part> <name> | e
      <surname>            ::= <name>
      <name>               ::= <letter> | <letter> <name>
      <letter>             ::= <first half letter> | <second half letter>
      <first half letter > ::= a | b | c | d | e | f | g | h | i | j | k | l | m
      <second half letter> ::= n | o | p | q | r | s | t | u | v | w | x | y | z


Task 4 - Boolean algebra

Develop a Cocol grammar BOOL.ATG for Boolean expressions like those you should remember from your Boolean Algebra course in CSC 201. In this context, allow your Boolean expressions to be terminated with an = operator, and to include parentheses, single-letter variables, the constants FALSE and TRUE (sometimes written as 0 and 1), and the operators AND (written either as AND or "&" or as a "dot", or simply omitted), OR (written either as OR or as "|" or as a "plus") and NOT, written either as a prefix NOT or as a suffix apostrophe. So some examples of Boolean expressions might be (these are in the file BOOL.TXT)

      a AND B OR (C OR NOT D) =     a . b + (c + d') =     a b + (c + d') =
      NOT (a OR b) AND TRUE =       (a + b)' . 1 =         (a + b)' AND 1 =
      b AND NOT C AND D =           b . c' . d =           b c' d =

Note that there is a precedence ordering among Boolean operators - parentheses take precedence over NOT, which takes precedence over AND, which takes precedence over OR.

To run your system:

      GENMAKE BOOL BORLAND (C++)      CR BOOL.ATG    (Pascal) 
      MAKE -f BOOL.MAK                TPC /M BOOL 
      BOOL BOOL.TXT                   BOOL BOOL.TXT 

And here is something to think about. Draw out the parse tree that would correspond to the expression NOT (A OR B). Then draw out the parse tree that would correspond to NOT A AND NOT B. De Morgan will tell you that these expressions mean the same thing. Do you get the same tree in each case? Discuss the implications.


Task 5 - The index to a book

In the prac kit you will find an extract from the index to my forthcoming bestseller "Hacking out a Degree" in the file INDEX.TXT. It looks something like this:

      abstract class 12, 45
      abstraction, data 165
      advantages of Modula-2  1-99, 100-500, Appendix 4
      aegrotat examinations -- see unethical doctors
      class attendance, intolerable 745
      deadlines, compiler course -- see sunrise
      horrible design (C and C++) 34, 45, 56-80
      lectures, missed 1, 3, 5-9, 12, 14-17, 21-25, 28
      loss of DP certificate 2000
      recursion -- see recursion
      senility, onset of 21-24, 105
      subminimum 40
      supplementary exams 45 - 49
      wasted years 1998-2000

Develop a grammar INDEX.ATG that describes this and similar indexes (indices?) and create a program that will analyse an index and accept or reject it (syntactically).


Task 6 - So what if Clang is so restrictive - fix it!

In the prac kit you will find the grammar for Clang taken from page 112 of the text. Generate a program from this that will recognise or reject simple Clang programs, and verify that the program behaves correctly with the two sample programs in the kit VOTE.CLN and BAD.CLN.

      GENMAKE CLANG BORLAND (C++)      CR CLANG.ATG    (Pascal) 
      MAKE -f CLANG.MAK                TPC /M CLANG 
      CLANG VOTE.CLN                   CLANG VOTE.CLN 

and

      CLANG BAD.CLN                    CLANG BAD.CLN 

Now go on to extend the grammar so that you can declare (and refer to) variables of rather more interesting types than at present. Allow a programmer to declare variables of int, char or bool types, arrays of these types, and even structures containing fields of these types. As an example of the sort of thing we have in mind, consider the following code (VARS.CLN):

    PROGRAM Silly;
    (* Demonstrate extended variable declarations and access to them *)
      INT I, J, List[10];
      CHAR String[100];
      BOOL Found, Searching, Sieve[100];
      STRUCT {
        CHAR Surname[25], FirstName[25];
        BOOL Clever;
        INT  Age;
      } Lecturer, Students[55];

      BEGIN
        Read(I);
        Read(Students[I].Clever, Lecturer.Name[12]);
        List[I] := Lecturer.Age;
        ....
      END.

Finally, extend your Clang grammar so that Clang programs can also include "inline" stack assembler code. Some compilers allow the programmer to program mainly in high-level code, but to inject low-level code where this might lead to greater efficiency than otherwise possible, or to produce special effects which the high level code is not suited or able to do. Here is an example of a program with such code (in the file INLINE.CLN) - of course this example program is not supposed to "do" anything, simply to demonstrate the idea:

    PROGRAM Silly;
    (* Demonstrate in-line assembler code *)
      INT I, J, K;
      BEGIN
        Read(I);
        INLINE
          PSH -1  LIT 6 ADD POP -2
        END;
        WHILE I < 10 DO BEGIN INLINE
            ADR -1
            VAL
            PRS 'The value of I is'
            PRN
          END;
          I := I + 3
        END;
        IF I > 12 THEN INLINE HLT (* give up *) END
      END.

Hint: All we require at thsi stage is the ability to describe these features. You do not have to try to give them any semantic meaning or write code to allow you to use them in any way. In later pracs we might try to do that, but please stick to what is asked for this time, and don't go being over ambitious.


Appendix: Practical considerations when using Coco/R

Firstly note that the QEdit editor system has been configured for you so that you can easily edit and check Cocol files, provided that these have the standard suffix .ATG. Thus a recommended way to proceed is as follows:

Start the editor

Q GRAMM.ATG

and then edit your grammar in the usual way.

From within the editor CTRL-F9 or CTRL-F10 will invoke the Coco/R compiler CR.EXE. If there are any errors, you will be able to move from one error location to the next by using the F4 key (the reason for the error is given in the lower window), or by using ALT-1 through ALT-9 to move to the 1st through 9th errors.

You can, of course, prepare ATG files with other ASCII editors, and you can also run Coco/R free standing with a command like:

CR XXXX.ATG

in which case the system will produce you a listing of the grammar file and the associated error messages, if any, in a corresponding file XXXX.LST.

For ease of use with the makefile concept and with various DOS file handling conventions, please

If you are using C++, make sure that the grammar includes the "pragma" $X (which generates C++ code, rather than C code). If you are using Pascal, make sure you include the pragma $N (which generates more readable code). Whether you are using C++ or Pascal, make sure that your grammar includes the pragma $C (which ensures that you generate a driver (main) program).

The first line of your grammar description should thus always read something like

COMPILER Name $XC /* generate C++, driver routine */

or

COMPILER Name $NC /* Pascal - generate token names, driver routine */

In fact you can include all three pragmas quite safely

COMPILER Name $NCX /* Standard pragmas for Pascal, Modula or C++ */

Error checking by Coco/R takes place in various stages. The first of these relates to simple syntactic errors - like leaving off a period at the end of a production. These are usually easily fixed. The second stage consists of ensuring that all non-terminals have been defined with right hand sides, that all non-terminals are "reachable", that there are no cyclic productions, no useless productions, and in particular that the productions satisfy what are known as LL(1) constraints.

We shall discuss LL(1) constraints in class in the next week, and so for this prac we shall simply hope that they do not become tiresome. The most common way of violating the LL(1) constraints is to have alternatives for a nonterminal that start with the same piece of string. This means that a so-called LL(1) parser (which is what Coco/R generates for you) cannot easily decide which alternative to take - and in fact will run the risk of going badly astray. Here is an example of a rule that violates the LL(1) constraints:

        assignment =   variableName ":=" expression 
                     | variableName index ":=" expression. 
        index      =   "[" subscript "]" . 

Both alternatives for assignment start with a variableName. However, we can easily write production rules that does not have this snag:

        assignment =   variableName [ index ] ":=" expression .
        index      =   "[" subscript "]" . 

A moment's thought will show that the various expression grammars that we have discussed in class - the left recursive rules like

        expression = term | expression "-" term . 

also violate the LL(1) constraints, and so have to be recast (see the CLANG.ATG file) as

        expression = term { "-" term } . 

to get around the problem.

For the moment, if you encounter LL(1) problems, please speak to the long suffering demonstrators who will (hopefully) be able to help you resolve them.