Tutorial 30 Solution

Computer Science 301 - 2005

Extra tutorials for Swot Week (3) - Coco Miscellany

These problems are taken from several quizes held over the years, and just probe various techniques useful in developing applications with Coco.

1. In the compiler studied in the course the following Cocol code was used for a WhileStatement:

    WhileStatement<StackFrame frame>      (. Label startLoop = new Label(known); .)
    =  "while" "(" Condition ")"          (. Label loopExit = new Label(!known);
                                             CodeGen.branchFalse(loopExit); .)
       Statement<frame>                   (. CodeGen.branch(startLoop);
                                             loopExit.here(); .)
    .

This generates code that matches the outline

         startLoop:    Condition
                       BZE  loopExit
                       Statement
                       BRN  startLoop
         loopExit:

Some authors contend that it would be preferable to generate code that matches the outline

                       BRN  testlabel
         startLoop:    Statement
         testlabel:    Condition
                       BNZ  startLoop

Their argument is that in most situations a loop body is executed many times, and that the efficiency of the system will be markedly improved by executing only one branch instruction on each iteration.

(a) Do you think the claim is justified for an interpreted system such as we have used? Explain your reasoning.
While this might be the case on some pipelined architectures, on the interpreted system used in the course it would probably make little difference. Far more time is likely to be spent executing the many statements that make up the loop body than in executing an extra unconditional branch instruction.
(b) If the suggestion is easily implementable in terms of our code generating functions, show how this could be done. If it is not easily implementable, why not?
It would be hard to implement using the simple minded code generator the students saw in this course, as we should need to delay generating the code for Condition until after we had generated the code for a Block. Generating the various branch instructions would be easy enough, of course. In a more sophisticated compiler that built a tree representation of the program before emitting any code, judicious tree walking would allow one to get the effect relatively simply.

2. In the compiler you (should) have developed to allow for characters as well as integers and booleans it was suggested that you use C-like syntax for casting, for example

           int i = 60;
           char c = (char) (4 + i);
           c = (char) (int) 'a';
           i = (int) c + 45

In Modula-2, what amounts to casting operations uses a syntax exemplified by

           i = 60;
           c = CHAR(4 + i);
           c = CHAR( INTEGER('a') );
           i = INTEGER(c) + 45;

How would you alter your compiler to use this latter approach. Which strikes you as "better"? Justify your answer.

The compiler as developed in the prac had code for the Primary production reading

    Primary<out int type>                 (. type = Entry.noType;
                                             int size;
                                             DesType des;
                                             ConstRec con; .)
    =    Designator<out des>              (. type = des.type;
                                             switch (des.entry.kind) {
                                               case Entry.Var:
                                                 CodeGen.dereference();
                                                 break;
                                               case Entry.Con:
                                                 CodeGen.loadConstant(des.entry.value);
                                                 break;
                                               default:
                                                 SemError("wrong kind of identifier");
                                                 break;
                                             } .)
       | Constant<out con>                (. type = con.type;
                                             CodeGen.loadConstant(con.value); .)
       | "new" BasicType<out type>        (. type++; .)
         "[" Expression<out size>         (. if (!isArith(size))
                                               SemError("array size must be integer");
                                             CodeGen.allocate(); .)
         "]"
       | "("
         (   "char" ")"
             Factor<out type>             (. if (!isArith(type))
                                               SemError("invalid cast");
                                             else type = Entry.charType;
                                             CodeGen.castToChar(); .)
           | "int" ")"
             Factor<out type>             (. if (!isArith(type))
                                               SemError("invalid cast");
                                             else type = Entry.intType; .)
           | Expression<out type> ")"
         )
    .

One could change the last part of this to read:

       | "char" "(" Expression<out type>  (. if (!isArith(type))
                                               SemError("invalid cast");
                                             else type = Entry.charType;
                                             CodeGen.castToChar(); .)
       | "int" "(" Expression<out type>   (. if (!isArith(type))
                                               SemError("invalid cast");
                                             else type = Entry.intType; .)
         ")"
       | "(" Expression<out type> ")"
    .

The Modula-2 syntax strikes me as being clearer, and more easily avoids the awkward factoring of the grammar needed to ensure that it remains LL(1).

3. The original Parva grammar in chapter 14 of the text book defines expressions as follows (note that this incorporates the idea of allowing "function calls" as part of an expression, which we did not cover in Practical 25):

       Expression = AddExp [ RelOp AddExp ] .
       AddExp     = [ "+" | "-" ] Term { AddOp Term } .
       Term       = Factor { MulOp Factor } .
       Factor     =   Designator [ "(" Arguments ")" ]
                    | Constant
                    | "new" BasicType "[" Expression "]"
                    | "!" Factor
                    | "(" Expression ")" .

What, if anything, would be the difference if this were replaced by

       Expression = AddExp [ RelOp AddExp ] .
       AddExp     = Term { AddOp Term } .
       Term       = Factor { MulOp Factor } .
       Factor     =   Designator [ "(" Arguments ")" ]
                    | Constant
                    | "new" BasicType "[" Expression "]"
                    | "!" Factor
                    | "+" Factor
                    | "-" Factor
                    | "(" Expression ")" .

The second grammar will allow expressions with multiple leading + or - signs, for example

A + + + - - B

That is all very well, if you like that sort of thing. But consider what the effect would be if you left out some of the spaces in the example and submitted the expression to a C++ or Java compiler. The expressions

A ++ + -- B
A + ++ - - B
A + + + - - B

all have different "meanings". Seems a pity to me to have to deal with obscure bugs that might come about from inserting or deleting a few spaces!

4. Here is a true story. A few years ago I received an e-mail from one of the many users of Coco/R in the world. He was looking for advice on how best to write Cocol productions that would describe a situation in which one non-terminal A could derive four other non-terminals W, X, Y, Z. These could appear in any order in the sentential form, but there was a restriction that each one of the four had to appear exactly once. He had realised that he could enumerate all 24 possibilities, on the lines of

A = W X Y Z | W X Z Y | W Y X Z | ....

but observed very astutely that this was very tedious, and also would become extremely tedious if one were to be faced with a more general situation in which one non-terminal could derive N alternatives, which could appear in order but subject to the restriction that each should appear once.

Write the appropriate parts of a Cocol specification that will describe the situation and check that the restrictions are correctly met. (Restrict your answer to the case of 4 derived non-terminals, as above.)

It's the action that counts! Simply relax the syntax to allow any number of W, X, Y, Z to appear in any order; count how many of each there are, and then check afterwards that each has appeared exactly once:

     A
     =              (. int w = 0, x = 0, y = 0, z = 0; .)
       {   W        (. w++; .)
         | X        (. x++; .)
         | Y        (. y++; .)
         | Z        (. z++; .)
       }            (. if (w * x * y * z != 1) SemError("Each of W X Y and Z must appear once"); .) .

Sadly, in the examination where I set this virtually nobody came up with this (though there were some complicated solutions that used actions to build up tables and so on. There were also a few students who came up with solutions where they tried recording strings, but note that the problem was posed in terms of non terminals W, X, Y and Z. Ah well. I keep trying to tell people that behind every problem there is a simple solution waiting to be discovered. There really is!

5. As you should know, if one wishes to let line ends become significant in Cocol grammars one resorts to writing code like

       CHARACTERS
         LF = CHR(10).
       TOKENS
         EOL = LF .

It would not be very difficult - and it might be deemed an improvement - to modify Coco/R to allow for LF and EOL to become "key words" (just as ANY and EOF are key words).

Coco/R is itself defined by a Cocol grammar, and, like the compilers you have just worked with, this can be used to create the scanner, parser and driver classes, which are then compiled along with the classes that handle symbol tables and other data structures. Give a T-diagram representation of how one could produce a new Java version of Coco/R that incorporates the new features.

 .--------------------------.          .--------------------------.          .--------------------------.
 |         Coco.atg         |          |         Coco.java        |          |         Coco.jar         |
 | Cocol  ----------> Java  |          | Cocol -----------> Java  |          | Cocol -----------> Java  |
 |        (modified)        |          |                          |          |        (modified)        |
 `-------.          .--------------------------.          .--------------------------.          .-------'
         |          |        Coco.jar   Scanner|          |           Javac          |          |
         |  Cocol   | Cocol  -------->  Parser |   Java   |  Java  -------->    JVM  |    JVM   |
         |          |        (original) Driver |          |                          |          |
         |          |                    Java  |          |                          |          |
         `------------------.          .--------------------------.          .------------------'
                            |   JVM    |                      ^   |   JVM    |       .----------.
                            | bytecode |     Table.java  -----.   | bytecode |       |    JVM   |
                            |          |     ParserGen.java --|   |          |       `----------'
                            `----------'     ScannerGen.java -'   `----------'       .----------.
                            .----------.     (also compiled)      .----------.       |   80x86  |
                            |   JVM    |                          |   JVM    |       `----------'
                            `----------'                          `----------'
                            .----------.                          .----------.
                            |   80x86  |                          |   80x86  |
                            `----------'                          `----------'

6. Given that you had access to this new version of Coco/R. how would you modify the Parva grammar so that at the end of compilation it could tell you how many lines of source code had been compiled?

Various ways of doing this can be found. One is to modify the grammar :

  Parva
  =  "void"                             (. Entry program = new Entry(); .)
     Ident<out program.name> "(" ")"    (. program.kind = Entry.Fun;
                                           program.type = Entry.voidType;
                                           Table.insert(program);
                                           StackFrame frame = new StackFrame();
                                           Table.openScope();
                                           Label DSPLabel = new Label(known);
                                           CodeGen.openStackFrame(); .)
     WEAK "{" { Statement<frame> }
     WEAK "}"                           (. if (debug) Table.printTable(OutFile.StdOut);
                                           CodeGen.fixDSP(DSPLabel.address(), frame.size);
                                           CodeGen.leaveProgram();
                                           Table.closeScope();
                                           IO.write(token.line); .)   /* ---------------- here ---------- */

Even better, perhaps, might be

     ...
     WEAK "}"                           (. if (debug) Table.printTable(OutFile.StdOut);
                                           CodeGen.fixDSP(DSPLabel.address(), frame.size);
                                           CodeGen.leaveProgram();
                                           Table.closeScope(); .)
     EOF                                (. IO.write(token.line); .)   /* ---------------- here ---------- */

Another way might be to make use of a pragma:

    PRAGMAS
      ...
      LineEnd    = LF.             (. LineCount++; .)

    COMMENTS FROM "//" TO lf
    COMMENTS FROM "/*" TO "*/"

    IGNORE CHR(9) .. CHR(13) - LF .    /* could no longer ignore line ends in quite the same way */

Here LineCount would be a static variable local to the parser, initialized to zero, so that at the end of the goal production one could add a simple action to display its final value.

Pragmas are quite useful for things that you basically would like to avoid, but maybe cannot quite do so.

7. Here is another true story. Some years ago a group of third year Rhodes students formed a team for one of the ACM programming competitions. One of the problems was quite clearly one for which their knowledge of Coco/R meant they could write a simple grammar and literally solve it within minutes. Unfortunately, solutions had to be able to read from standard input and write to standard output, so that they could be tested with (hidden) data by giving a command like

java -jar Solution.jar <data >results

Standard Coco requires that the input file name be supplied as a command line parameter

java -jar Solution.jar data >results

Time did not permit them to modify Coco/R. Given that you have some time, suggest how we could produce a version of Coco/R suited to solving ACM competition problems.

I would not set this sort of question in an exam, though it is a true story nevertheless! What one needs to do is to modify the Driver.frame file (slightly) and the Scanner.frame file (quite a bit) so that the latter can read in the complete source text from "standard input" and store it in the internal large character buffer that Coco/R fills with the source text just before parsing commences.

The curious among you might like to look at TUT30.ZIP (Java version) or TUT30C.zip (C# version) to see how this can actually be done quite easily. My modifications are a bit crude, involving first using a StringBuffer to store the source text and then a for loop to transfer the character form the StringBuffer to the final buffer. I suspect a much better way of doing this is possible (Terry Theorem 3 says there is always a bettr way!)

TUT30.ZIP contains the "tutor payment" problem from Tut 30 which can be solved by running

CMAKE Prac
CRUN Prac prac.txt (to run in the traditional way)

CRUN Prac <prac.txt (to run redirecting StdIn from prac.txt)