RHODES UNIVERSITY

November Examinations - 2001 - Solutions


Computer Science 301 - Paper 2 - Solutions

Examiners:                                          Time 3 hours
   Prof P.D. Terry                                  Marks 180
   Prof E.H. Blake                                  Pages 7 (please check!)

Answer all questions. Answers may be written in any medium except red ink.

(For the benefit of future readers of this paper, various free information was made available to the students 24 hours before the formal examination. This included the full text of Section B. During the examination, candidates were given machine executable versions of the Coco/R compiler generator, access to a computer and machine readable copies of the questions.)


Section A [ 95 marks ]

1. Draw a diagram clearly depicting the various phases found in a typical compiler. Indicate which phases belong to the "front end" and which to the "back end" of the compiler. [ 9 marks ]

This is essentially bookwork. The diagram in the text was as follows:

                                         Source code
                                             |
                                             |
                                             |
               .-              Character handler (non-portable) ----.
               |                             |                      |
               |                             |                      |
               |                             |                      |
  Analytic     |            .-------  Lexical analyser  ------------|
  phase        |            |             (Scanner)                 |
               |            |                |                      |
               |            |                |                      |
  (Front end)  |            |--------  Syntax analyser  ------------|
               |            |              (Parser)                 |
               |            |                |                      |
               |            |                |                      |
               |   Table  --+------- Constraint analyser -----------+--  Error
               `-  handler  |    (Static semantic analyser)         |    reporter
                            |                |                      |
               .-           |                |                      |
               |            |--  Intermediate code generator  ------|
               |            |                |                      |
               |            |                |                      |
               |            |                |                      |
  Synthetic    |            `---------  Code optimizer  ------------|
  phase        |                             |                      |
               |                             |                      |
               |                             |                      |
  (Back end)   |                Code generator (non-portable) ------'
               |                             |
               |                             |
               |                             |
               `-              Peephole optimizer (non-portable)
                                             |
                                             |
                                         Object code

2. (a) What is meant by the term "self-compiling compiler"? [ 2 marks ]

A self-compiling compiler is one written in the source language that it is intended to compile. When it compiles this source code it should, of course, reproduce itself.

(b) Describe (with the aid of T-diagrams) how you would perform a "half bootstrap" of a compiler for language X, given that you have access to the source and object versions of a compiler for X that can be executed on machine OLD and wish to produce a self-compiling compiler for language X that can be executed on machine NEW. [ 10 marks ]

This is essentially book work. The solution I was looking for is:

(a) Rewrite the back end of the X compiler to produce NEW code and compile this to give a cross-compiler


       .--------------------------.          .--------------------------.
       |          XtoNEW.X        |          |         XtoNEW.OLD       |
       |   X     ------> NEW code |          |   X     ------> NEW code |
       |                          |          |                          |
       `-------.          .--------------------------.          .-------'
               |          |        XtoOLD.OLD        |          |
               |    X     |   X     ------> OLD-code | OLD-code |
               |          |                          |          |
               `------------------.          .------------------'
                                  |          |
                                  | OLD-code |
                                  |          |
                                  `----------'

(b) Compile the source code of the modified compiler with the cross-compiler to produce the NEW compiler

       .--------------------------.          .--------------------------.
       |          XtoNEW.X        |          |         XtoNEW.OLD       |
       |   X     ------> NEW code |          |   X    -------> NEW code |
       |                          |          |                          |
       `-------.          .--------------------------.          .-------'
               |          |       XtoNEW.OLD         |          |
               |    X     |   X     ------> NEW code | NEW code |
               |          |                          |          |
               `------------------.          .------------------'
                                  |          |
                                  | OLD-code |
                                  |          |
                                  `----------'

3. Consider the following grammar expressed in EBNF for describing the progress of a typical university course:

      Course       = Introduction Section { Section } Conclusion .
      Introduction = "lecture" [ "handout" ] .
      Section      = { "lecture" | "prac" "test" | "tut" | "handout" } "test" .
      Conclusion   = [ "Panic" ] "Examination" .

(a) Develop a recursive descent parser for the grammar (matching the EBNF as given above). Assume that you have suitable Abort, Accept and GetSym routines (which you need not develop), and that GetSym decodes Sym as one of the tokens below. Your system should detect errors, of course, but need not incorporate "error recovery". [ 12 marks ]

{ EOFSym, lectSym, hanSome, pracSym, testSym, tutSym, PANICSYM, examSym, unknownSym }

A satisfactory solution would look something like

    void Course (void) {
    // Course = Introduction Section { Section } Conclusion .
      Introduction();
      Section();
      while (Sym == lectureSym || Sym == pracSym ||
             Sym == tutSym || Sym == hanSome) Section();
      Conclusion();
    }

    void Introduction (void) {
    // Introduction = "lecture" [ "handout" ] .
      Accept(lectSym, "lecture expected");
      if (Sym == HanSome) GetSym();
    }

    void Section (void) {
    // Section = { "lecture" | "prac" "test" | "tut" | "handout" } "test" .
      while (Sym == lectureSym || Sym == pracSym ||
             Sym == tutSym || Sym == hanSome) {
        switch (Sym) {
          case lectSym : GetSym(); break;
          case pracSym : GetSym(); Accept(testSym, "test expected"); break;
          case tutSym  : GetSym(); break;
          case hanSome : GetSym(); break;
        }
      }
      Accept(testSym, "final test expected");
    }

    void Conclusion (void) {
    // Conclusion = [ "Panic" ] "Examination" .
      if (Sym = PANICSYM) GetSym();
      Accept(examSym, "Exam Expected");
    }

(b) What do you understand by the statement "two grammars are equivalent"? [ 2 marks ]

Equivalent grammars produce the same set of sentences, using different sets of productions.

(c) Rewrite these productions so as to produce an equivalent grammar in which no use is made of the EBNF meta-brackets { ... } or [ ... ]. [ 5 marks ]

      Course          = Introduction Section MoreSections Conclusion .
      MoreSections    = Section MoreSections | e .
      Introduction    = "lecture" OptionalHandout .
      OptionalHandout = "handout" | e  .
      Section         = Components "test" .
      Components      = Component Components | e .
      Component       = "lecture" | "prac" "test" | "tut" | "handout" .
      Conclusion      = OptionalPanic "Examination"
      OptionalPanic   = "Panic" | e .

(d) Analyse the equivalent grammar derived in (c) to determine whether it obeys the LL(1) constraints. [ 8 marks ]

The grammar is non-LL(1). OptionalHandout is nullable, and

    FIRST(OptionalHandout) = { "handout" }
    FOLLOW(OptionalHandout) = { "handout", "lecture", "prac", "tut", "test" }

In other respects it is properly LL(1), but candidates should have been able to justify that in some way.

(e) If you found that the grammar did not obey the LL(1) constraints, does that mean that the parser produced in (a) would fail for some valid inputs? If so, give an example of input that could not be parsed; otherwise justify your claim that it would always succeed. [ 4 marks ]

The parser would work. The situation is completely analogous to the "dangling else" problem - if a handout follows the opening lecture it is bound to Introduction, rather than forming part of a subsequent Section.

4. The CS301 language for which a compiler was developed in this course allows for various statements, including a "while" loop. Relevant parts of the attributed grammar are shown below.

     StatementSequence
     =  Statement { WEAK ";" Statement } .

     Statement
     =  SYNC [   Assignment
               | IfStatement      | WhileStatement
               | ReadStatement    | WriteStatement
               | ReturnStatement  | CaseStatement
             ] .

     WhileStatement
     =                             (. CGEN_labels startloop, testlabel, dummylabel; .)
        "WHILE"                    (. CGen->storelabel(startloop); .)
        Condition "DO"             (. CGen->jumponfalse(testlabel, CGen->undefined); .)
        StatementSequence          (. CGen->jump(dummylabel, startloop);
                                      CGen->backpatch(testlabel); .)
        "END" .

An enthusiastic language extender has suggested that CS301 would be greatly improved by the addition of a post-test loop, and has come up with two possibilities:

PostTestStatement = "DO" StatementSequence "WHILE" Condition .

or

PostTestStatement = "DO" StatementSequence "UNTIL" Condition .

(a) Advise her, with reasons, as to whether or not either or both of these suggestions would be acceptable, and which (if either) would be preferable. [ 5 marks ]

PostTestStatement = "DO" StatementSequence "WHILE" Condition .

This won't work. A WhileStatement starts with WHILE, so one would not be able to distinguish a WhileStatement in the StatementSequence from the terminating WHILE of the PostTestStatement.

PostTestStatement = "DO" StatementSequence "UNTIL" Condition .

This of course works.

(b) For the form of your choice, show how the grammar above would be extended to recognise the statement form and generate correct code. [ 5 marks ]

     Statement
     =  SYNC [   Assignment       | PostTestStatement
               | IfStatement      | WhileStatement
               | ReadStatement    | WriteStatement
               | ReturnStatement  | CaseStatement
             ] .

     PostTestStatement
     =                             (. CGEN_labels startloop, dummylabel; .)
        "DO"                       (. CGen->storelabel(startloop); .)
        StatementSequence
        "UNTIL" Condition          (. CGen->jumponfalse(dummylabel, startloop); .) .

5. As you should recall, CS301 is a "strictly typed" language, and expressions of the form

NOT 6
or
TRUE > 56
or
3 + 4 AND 5

are unacceptable. Most strictly typed languages allow for programmers to circumvent (bypass) these restrictions, typically by allowing so-called "type casting", as exemplified by

NOT BOOL(6)
or
INT(TRUE) > 56
or
3 + INT(BOOL(4) AND BOOL(5))

(a) In the free information for this paper appears an attributed grammar for generating code to evaluate expressions which does not incorporate such type casting. Show how the grammar could be altered to do so. (Only write out those parts that would have to change.) [ 6 marks ]

This is trivial if you understand the system! We need a simple modification to Factor

  Factor<TABLE_types &f>
  =                             (. int value;
                                   TABLE_entries entry; .)
       Designator<classset(TABLE_consts, TABLE_vars), entry>
                                (. f = entry.type;
                                   switch (entry.idclass)
                                   { case TABLE_vars :
                                       CGen->dereference(); break;
                                     case TABLE_consts :
                                       CGen->stackconstant(entry.c.value); break;
                                   } .)
     | Number<value>            (. CGen->stackconstant(value); f = TABLE_ints; .)
     | "TRUE"                   (. CGen->stackconstant(1); f = TABLE_bools .)
     | "FALSE"                  (. CGen->stackconstant(0); f = TABLE_bools .)
     | "INT"  "(" Expression<f> (. if (f != TABLE_bools) SemError(218); .)
       ")"                      (. f = TABLE_ints; .)
     | "BOOL" "(" Expression<f> (. if (f != TABLE_ints) SemError(218); .)
       ")"                      (. f = TABLE_bools; .)
     | "(" Expression<f> ")" .

(b) If strict type checking can be bypassed in this way, what advantages or disadvantages do strictly typed languages possess over languages like C++, where Boolean, integer and character types are all compatible? [ 4 marks ]

The advantage is that one gets some measure of protection against grossly stupid programming errors - one has to make a conscious effort to mix types.

6. As you should be aware, IP addresses as used in Internet communication are typically expressed in "dotted decimal" form, as exemplified by 146.231.128.6. The IP address actually represents a 32 bit integer; the four numbers in the quadruple corresponding to successive 8 bit components of this integer. For humans, machine addressing is usually more memorable when expressed in "DNS" format, as exemplified by terrapin.ru.ac.za. Some systems maintain tables of matching addresses, for example

         146.231.122.13      cspt1.ict.ru.ac.za    #comments appear like this
         146.231.128.6       terrapin.ru.ac.za
         146.231.56.10       thistle-sp.ru.ac.za
         147.28.0.62         psg.com

When we moved our CS and IS departments to new premises recently, a decision was made to rename and uniquely renumber all the many machines in our possession. Our system administrators tried to draw up a table like the one above, which was then merged with the existing table in the IT division. Unfortunately, a few mistakes were made, which caused havoc until they were ironed out. For example, there were lines reading

         146.231.122.11235   cspt1.ict.ru.ac.za    #invalid IP address
         146.231.122.15      cspt2.ict.ru.ac.za
         146.231.122.15      cspt3.ict.ru.ac.za    # non-unique IP address

Complete the ATG file below to show how Coco/R could be used to develop a system that would enable a file in this format quickly to be checked and the errors identified. (Hint: make use of the template list handling class that proved useful in various other applications in this course, the interface to which is provided in the free information for this paper). [ 24 marks ]

This is a longish example (hence the high mark allocation). But there is nothing very difficult in it, as the solution will show. The students in this class had written many Cocol grammars that do this sort of token definition and recognition, combined with the use of a list handler template. In fact this example is simple enough that the IPList object could have been declared "globally" which would allow one to eliminate some parameter passing.

    COMPILER Check $XCN

    #include "misc.h"
    #include "genlist.h"
    typedef List<long> IPList;

    IGNORE CASE
    IGNORE CHR(1) .. CHR(31)

    CHARACTERS
      digit   = "0123456789" .
      letter  = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
      eol     = CHR(10) .

    COMMENTS FROM "#" TO eol

    TOKENS
      number = digit { digit } .
      name   = letter { letter | digit | "." | "-" } .

    PRODUCTIONS
      Check
      =                             (. IPList L; .)
      { Entry<L> }                  (. if (Successful()) printf("\n all correct"); .)
      EOF .

      Entry<IPList &L>
      = IPNumber<L> name .

      IPNumber<IPList &L>
      =                             (. long n, m; .)
      Number<n>
      "." Number<m>                 (. n = 256 * n + m; .)
      "." Number<m>                 (. n = 256 * n + m; .)
      "." Number<m>                 (. n = 256 * n + m;
                                       if (L.isMember(n)) SemError(1001);
                                       else L.add(n); .)
      .

      Number<long &n>
      =                             (. char str[100] .)
      number                        (. LexString(str, sizeof(str) - 1);
                                       n = atoi(str);
                                       if (n > 255) SemError(1000); .)
      .

    END Check.


Generic List Template Class - Question 6

  /* Simple generic linked list class.  Only the interface is shown here
     George Wells  --  27 February 1996 */

  template<class T> class List
    { public:
        List (void);                               // Constructor
        List (const List& lst);                    // Copy constructor
        ~List (void);                              // List destructor
        void add (T item, int position = INT_MAX); // Place new item in a List
        void remove (int position, T &item);       // Remove item at position in List
        int length (void);                         // Return number of elements in List
        T& operator[] (int index);                 // Subscript a List
        int position (T item);                     // Return position item in a List (or -1)
        int isMember (T item);                     // True if item is in List
    }; // class List


Expression parser from CS 301 compiler - Question 5

  Expression<TABLE_types &e>
  =                             (. TABLE_types a;
                                   CGEN_labels shortcircuit; .)
     AndExp<e>
     { "OR"                     (. if (shortBoolean) CGen->booleanop(shortcircuit, CGEN_opor) .)
       AndExp<a>                (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                     { SemError(218); e = TABLE_none; }
                                   else e = TABLE_bools;
                                   if (shortBoolean) CGen->backpatch(shortcircuit);
                                   else CGen->binarybooleanop(CGEN_orrop); .)
     } .

  AndExp<TABLE_types &a>
  =                             (. TABLE_types e;
                                   CGEN_labels shortcircuit; .)
     RelExp<a>
     { "AND"                    (. if (shortBoolean) CGen->booleanop(shortcircuit, CGEN_opand) .)
       RelExp<e>                (. if (!(booltypes.memb(a) && booltypes.memb(e)))
                                     { SemError(218); a = TABLE_none; }
                                   else a = TABLE_bools;
                                   if (shortBoolean) CGen->backpatch(shortcircuit);
                                   else CGen->binarybooleanop(CGEN_andop); .)
     } .

  RelExp<TABLE_types &r>
  =                             (. TABLE_types a;
                                   CGEN_operators op; .)
     AddExp<r>
     [ RelOp<op> AddExp<a>      (. if (r == TABLE_bools || a == TABLE_bools) SemError(218);
                                   r = TABLE_bools; CGen->comparison(op) .)
     ] .

  AddExp<TABLE_types &a>
  =                             (. TABLE_types m;
                                   CGEN_operators op; .)
     MultExp<a>
     { AddOp<op> MultExp<m>     (. if (!(arithtypes.memb(a) && arithtypes.memb(m)))
                                     { SemError(218); a = TABLE_none; }
                                   else CGen ->binaryintegerop(op); .)
     } .

  MultExp<TABLE_types &m>
  =                             (. TABLE_types u;
                                   CGEN_operators op; .)
     UnaryExp<m>
     { MulOp<op> UnaryExp<u>    (. if (!(arithtypes.memb(m) && arithtypes.memb(u)))
                                     { SemError(218); m = TABLE_none; }
                                   else CGen ->binaryintegerop(op); .)
     } .

  UnaryExp<TABLE_types &u>
  =    Factor<u>
     | "+" UnaryExp<u>          (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; } .)
     | "-" UnaryExp<u>          (. if (!arithtypes.memb(u)) {
                                     SemError(218); u = TABLE_none; }
                                   else CGen->negateinteger(); .)
     | "NOT" UnaryExp<u>        (. if (!booltypes.memb(u)) SemError(218);
                                   else CGen->negateboolean();
                                   u = TABLE_bools; .) .

  Factor<TABLE_types &f>
  =                             (. int value;
                                   TABLE_entries entry; .)
       Designator<classset(TABLE_consts, TABLE_vars), entry>
                                (. f = entry.type;
                                   switch (entry.idclass)
                                   { case TABLE_vars :
                                       CGen->dereference(); break;
                                     case TABLE_consts :
                                       CGen->stackconstant(entry.c.value); break;
                                   } .)
     | Number<value>            (. CGen->stackconstant(value); f = TABLE_ints; .)
     | "TRUE"                   (. CGen->stackconstant(1); f = TABLE_bools .)
     | "FALSE"                  (. CGen->stackconstant(0); f = TABLE_bools .)
     | "(" Expression<f> ")" .


Section B [ 85 marks ]

Please note that there is no obligation to produce a machine readable solution for this section. Coco/R and other files are provided so that you can enhance, refine, or test your solution if you desire. If you choose to produce a machine readable solution, you should create a working directory, unpack EXAM.ZIP, modify any files that you like, and then copy all the files back to the blank diskette that will be provided.

For several years your predecessors in this course - and even yourselves - have been expected, as part of the practical course, to gain an understanding of a stack machine architecture by preparing programs written in a very limited form of assembler language in which all addressing had to be done in terms of numerical values which students were supposed to calculate for themselves - with understandable frustration setting in every time they inserted or deleted a few statements into programs as they debugged them. During this time students have begged me to give them a "real" assembler in which alphanumeric labels could be used to identify constants, variables, and the destinations of branch instructions. I have, of course, always been too busy to do this, but with 24 hours at your disposal and the expert knowledge you have amassed after studying the translators course this year, you should be able to remedy this situation - and if you succeed you may be able to make some useful pocket money selling your system to the class next year!

We start by observing that, rather than writing code as exemplified by the columns on the left, most people would prefer to write code as exemplified by the columns on the right

ASSEM  $D+           |   ASSEM   $D+                    # Read a list and write it backwards
                     |     CONST                        # High level declarations
                     |       Max = 10;                  # constants
                     |       Width = 6;
                     |     INT                          # variables
BEGIN                |       List[Max], I, Item;
  DSP    13          |     BEGIN                        # DSP 13 can be generated automatically
  ADR   -12          |              ADR     I           # I := 0;
  LIT     0          |              LIT     0           #
  STO                |              STO                 #
  ADR   -13          |        READ  ADR     Item        # LOOP
  INN                |              INN                 #   Read(Item);
  ADR   -13          |              ADR     Item        #
  VAL                |              VAL                 #
  BZE    32          |              BZE     DONE        #   IF Item = 0 THEN EXIT END;
  ADR    -1          |              ADR     List        #
  ADR   -12          |              ADR     I           #
  VAL                |              VAL                 #
  LIT    11          |              LIT     SIZE(List)  #
  IND                |              IND                 #
  ADR   -13          |              ADR     Item        #
  VAL                |              VAL                 #
  STO                |              STO                 #   List[I] := Item;
  ADR   -12          |              ADR     I           #   I++;
  PPP                |              PPP                 #
  BRN     7          |              BRN     READ        # END;
  PRS   'Reversed'   |        DONE  PRS     'Reversed'  # Write('Reversed');
  ADR   -12          |        PRINT ADR     I           # WHILE I > 0 DO
  VAL                |              VAL                 #
  LIT     0          |              LIT     0           #
  GTR                |              GTR                 #
  BZE    59          |              BZE     EXIT        #
  ADR   -12          |              ADR     I           #   I--;
  MMM                |              MMM                 #
  ADR    -1          |              ADR     List        #
  ADR   -12          |              ADR     I           #
  VAL                |              VAL                 #
  LIT    11          |              LIT     Max + 1     #
  IND                |              IND                 #
  VAL                |              VAL                 #
  LIT     6          |              LIT     Width       #   Write(List[I] : 6);
  PRN                |              PRN                 #
  BRN    34          |              BRN     PRINT       # END
  HLT                |        EXIT  HLT                 # RETURN
END.                 |     END.

Since it might be dangerous to place too much reliance on what would be required of an assembler, or indeed determine exactly what is permitted in the assembler language itself from studying this one single example, here are some suggestions for deriving a complete system. (In the exam kit will be found some other example programs to assist in your development of the assembler.)

(a) You can make use of Coco/R, and in particular derive a solution by making use of the attributed grammar and support modules (symbol table handler, code generator, error handler, frame files etc) that were useful in the development of a CS301 compiler/interpreter.

(b) The assembler statements should appear between a bracketing BEGIN and END, and may optionally be preceded by declarations of constants and variables (like Max, Width, List, I and Item) using similar syntax to that found in CS301 programs.

(c) The assembler system should be able to assemble simple programs in which the addressing is all given in absolute form (as in the example on the left), as well as those with alphanumeric names and labels.

(d) Treat the mnemonics as key (reserved) words. Since AND and NOT are mnemonic opcodes, use !, && and || for Boolean operators.

(e) Alphanumeric labels (like READ, PRINT, DONE and EXIT) used as the targets of branch instructions must be uniquely defined. For simplicity, these labels should not be allowed to duplicate identifiers used in the declaration of named constants or variables.

(f) It is acceptable to define labels without ever having branch instructions that referred to them, to have multiple labels defined at one point, or to have multiple branches to one point, for example

                    BRN   START             # Unnecessary, but legal
            START
            LOOP    LIT   6
                    LIT   7
                    PRN
                    BRN   START             # equivalent to BRN LOOP

(g) It would not be acceptable to have branch instructions refer to labels that are never defined, for example

          BEGIN
            LOOP    LIT   6
                    LIT   7
                    PRN
                    BRN   START             # Start is undefined
          END

(h) The LIT and DSP mnemonics should be allowed to take a constant-generating expression as a parameter:

                    DSP  6                  # Absolute form
                    LIT  Max                # Equivalent to LIT  10
                    LIT  Max * 10 + Width   # Equivalent to LIT  106
                    LIT  Size(Array)        # Equivalent to LIT  11

where Size is a pseudo function that can return the storage space needed for the variable quoted as its actual argument (this would clearly be useful in applications that use arrays in particular).

(i) The ADR mnemonic should be allowed to take a (possibly signed) number or a variable name as its parameter. In the case where this name refers to an array a possible extension would be to allow it to have a constant subscript indicating a further offset that could be computed at assemble time, for example:

                    ADR  -1                 # absolute addressing
                    ADR  Item               # equivalent to ADR -13
                    ADR  List               # equivalent to ADR -1
                    ADR  List[0]            # equivalent to ADR -1
                    ADR  List[2]            # equivalent to ADR -3

(j) Not much attention need be paid to type checking - at this level programmers should be relied on to get these semantics correct for themselves.

(k) Apart from situations where they are necessary for separating other alphanumeric quantities, whitespace characters may be used at the coder's discretion to improve the appearance of source code.

(l) In the extended compiler for CS301 you may have made use of additional opcodes to the ones listed below, in particular to handle switch/case statements. For the purposes of this examination you may confine your assembler to the opcodes in the table on page 7.

Instruction set for stack machine

Several of these operations belong to a category known as zero address instructions. Even though operands are clearly needed for operations such as addition and multiplication, the addresses of these are not specified by part of the instruction, but are implicitly derived from the value of the stack pointer SP. The two operands are assumed to reside on the top of the stack and just below the top; in our informal descriptions their values are denoted by TOS (for "top of stack") and SOS (for "second on stack"). A binary operation is performed by popping its two operands from the stack into (inaccessible) internal registers in the CPU, performing the operation, and then pushing the result back onto the stack.

    AND      Pop  TOS  and  SOS,  and  SOS  with  TOS, push result to form new  TOS 
    ORR      Pop  TOS  and  SOS,  or   SOS  with  TOS, push result to form new  TOS 
    ADD      Pop  TOS  and  SOS,  add  SOS  to  TOS, push sum to form new  TOS 
    SUB      Pop  TOS  and  SOS,  subtract  TOS from  SOS, push difference to form new  TOS 
    MUL      Pop  TOS  and  SOS,  multiply  SOS by  TOS, push product to form new  TOS 
    DVD      Pop  TOS  and  SOS,  divide  SOS  by  TOS, push quotient to form new  TOS 
    REM      Pop  TOS  and  SOS,  divide  SOS  by  TOS, push remainder to form new  TOS 
    EQL      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS = TOS, 0 otherwise
    NEQ      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS ¹ TOS, 0 otherwise
    GTR      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS > TOS, 0 otherwise
    LSS      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS < TOS, 0 otherwise
    LEQ      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS £ TOS, 0 otherwise
    GEQ      Pop  TOS  and  SOS,  push 1 to form new  TOS  if  SOS ³ TOS, 0 otherwise
    NEG      Integer negation of TOS 
    NOT      Boolean negation of TOS 

    STK      Dump stack to output (useful for debugging)
    PRN      Pop  TOS  and  SOS, write  SOS to output as an integer value in field width TOS
    PRB      Pop  TOS  and  SOS, write  SOS to output as a Boolean value in field width TOS
    PRS  A   Write the nul-terminated string that is assumed to be stacked in the literal pool from Mem[A]
    NLN      Write a newline (carriage-return-line-feed) sequence
    INN      Read integer value, pop  TOS, store the value that was read in Mem[TOS]
    INB      Read Boolean value, pop  TOS, store the value that was read in Mem[TOS]

    DSP  A   Decrement value of stack pointer  SP  by  A
    LIT  A   Push the integer value  A  onto the stack to form new  TOS
    ADR  A   Push the value  BP + A  onto the stack to form new  TOS.  (This value is conceptually the address
             of a variable stored at an offset  A  within the stack frame pointed to by the base register  BP.)
    IND      (Range checked indexing of array) Pop  TOS  to yield  Size; pop  TOS  and  SOS; if  0 £ TOS < Size
             then subtract  TOS  from  SOS, push result to form new  TOS
    INX      (Unchecked indexing of array) Pop  TOS  and  SOS,  subtract  TOS from  SOS, push result to form new  TOS 
    VAL      (Dereferencing) Pop  TOS,  and push the value of  Mem[TOS]  to form new TOS
    DUP      Push  TOS  to form duplicate copy
    STO      Pop  TOS  and  SOS; store  TOS  in  Mem[SOS]
    PPP      Pop  TOS and increment  Mem[TOS]  by 1
    MMM      Pop  TOS and decrement  Mem[TOS]  by 1

    HLT      Halt
    BRN  A   Unconditional branch to instruction  A
    BZE  A   Pop  TOS,  and branch to instruction  A  if  TOS  is zero
    BAN  A   Branch to instruction  A  if  TOS  is false; else pop TOS
    BOR  A   Branch to instruction  A  if  TOS  is true; else pop TOS
    NOP      No operation

The instructions in the first group are concerned with arithmetic and logical operations, those in the second group afford I/O facilities, those in the third group allow for the access of data in memory by means of manipulating addresses and the stack, and those in the last group allow for control of flow of the program itself.


(a) The "absolute" assembler is trivial - it requires very few lines of code. Just for fun here is a complete attribute grammar for it:

  COMPILER ASSEM $XCN
  /* Stack assembler level 1 grammar - this version for csC 301 exam solution 2001
     P.D. Terry, Rhodes University, 2001 */

  /* This version is just a simple "absolute" assembler - stage 1 */

  #include "misc.h"
  #include "set.h"
  #include "cgen.h"
  #include "table.h"
  #include "report.h"

  bool debug;
  extern TABLE *Table;
  extern CGEN *CGen;
  extern REPORT *Report;  // needed for debugging pragma

  /*--------------------------------------------------------------------------*/

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)

  PRAGMAS
    DebugOn    = "$D+" .          (. Report->debugging = true; .)
    DebugOff   = "$D-" .          (. Report->debugging = false; .)

  CHARACTERS
    cr         = CHR(13) .
    lf         = CHR(10) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    instring   = ANY - "'" - cr - lf - CHR(0) .

  COMMENTS FROM "{" TO "}"
  COMMENTS FROM "#" TO lf          /* ========= addition to original */

  TOKENS
    identifier = letter { letter | digit } .
    number     = digit { digit } .
    string     = "'" (instring | "''") { instring | "''" } "'" .

  PRODUCTIONS
    ASSEM = "ASSEM" Block "." .

    Block = "BEGIN" StatementSequence "END" .

    StatementSequence = { Statement } .

    Statement = ( OneWord | TwoWord | WriteString ) .

    OneWord
    =  ( "ADD" | "AND" | "DUP" | "DVD" | "EQL" | "GEQ" | "GTR" | "HLT" | "INB" |
         "IND" | "INN" | "INX" | "LEQ" | "LSS" | "MMM" | "MUL" | "NEG" | "NEQ" |
         "NLN" | "NOP" | "NOT" | "ORR" | "PPP" | "PRB" | "PRN" | "REM" | "STK" |
         "STO" | "SUB" | "VAL" )
                                  (. char mnemonic[4];
                                     LexName(mnemonic, sizeof(mnemonic) - 1);
                                     CGen->oneword(mnemonic); .) .

    TwoWord
    =  ( "DSP" | "LIT" | "BRN" | "BZE" | "BAN" | "BOR" | "ADR" )
                                  (. char mnemonic[4];
                                     int value;
                                     LexName(mnemonic, sizeof(mnemonic) - 1); .)
       SignedNumber<value>        (. CGen->twoword(mnemonic, value); .) .

    WriteString
    =                             (. char str[600];
                                     CGEN_labels startstring; .)
       "PRS" String<str>          (. CGen->stackstring(str, startstring);
                                     CGen->writestring(startstring); .) .

    SignedNumber<int &value>
    =                             (. int sign = 1 .)
       [   "+"
         | "-"                    (. sign = -1; .)
       ] Number<value>            (. value = sign * value; .) .

  /* the next productions are straight out of the original compiler */

    String<char *str>
    =  string                     (. char local[100];
                                     LexString(local, sizeof(local) - 1);
                                     int i = 0;
                                     while (local[i]) /* strip quotes */
                                     { local[i] = local[i+1]; i++; }
                                     local[i-2] = '\0';
                                     i = 0;
                                     while (local[i]) /* find internal quotes */
                                     { if (local[i] == '\'') /* single quote */
                                       { int j = i;
                                         while (local[j])
                                         { local[j] = local[j+1]; j++; }
                                       }
                                       i++;
                                     }
                                     strcpy(str, local); .) .

    Number <int &num>
    =  number                     (. char str[100];
                                     int i = 0, l, digit, overflow = 0;
                                     num = 0;
                                     LexString(str, sizeof(str) - 1);
                                     l = strlen(str);
                                     while (i <= l && isdigit(str[i]))
                                     { digit = str[i] - '0'; i++;
                                       if (num <= (maxint - digit) / 10)
                                         num = 10 * num + digit;
                                       else overflow = 1;
                                     }
                                     if (overflow) SemError(200); .) .

  END ASSEM.

(b) The definition of constants and variables and the autogeneration of DSP can be lifted verbatim out of the CS301 compiler.

(c) The LIT/DSP opcodes and their constant expression address fields can simply use the constant expression parser productions and actions straight out of the CS301 compiler. The only extra is the ability to add "size(array)" which is requires a simple symbol table lookup and about three lines of code similar to other look ups students should be familiar with from the CS301 compiler.

(d) The ADR opcode and its variable name parameter similarly requires a symbol table look up. It makes sense to hack the Designator production and actions from the original CS301 compiler to do this.

(e) The only real challenge lies in the labels for branch instructions. The obvious way to do this is to have a new "class" of identifier in the symbol table, and to set the info in the symbol table when the labels are declared. But if forward references are found the entries will have to be made ahead of time and a way found to fix up the forward references when the label is finally declared. This is actually closely analogous with similar fixups students should have learned to do in a practical where they had to deal with multiple forward references as "break" points in a switch statement.

(f) One non-obvious point is that it makes sense not to define a labelled "statement" in th obvious way

    Statement         = [ Label ] ( OneWord | Branch | DSPorLIT | LValue | WriteString ) .

but instead to treat a Label as a sort of "empty" statement.

    Statement         = ( Label | OneWord | Branch | DSPorLIT | LValue | WriteString ) .

This allows one conceptually to attach multiple labels to a single code-generating statement. Other attempts to do this lead too easily to non-LL(1) grammars.

A complete solution turns out to rely on only 150 lines or so of new code at the most, and a well prepared student should have been able to extract the essential point in preparing an answer that could be written down under exam conditions very quickly.

The basic grammar looks like this:

  /* this first part is all straight out of the extant compiler */

  COMPILER ASSEM
  /* Stack assembler level 1 grammar - this version for csC 301 exam solution 2001
     P.D. Terry, Rhodes University, 2001 */

  IGNORE CASE
  IGNORE CHR(9) .. CHR(13)

  CHARACTERS
    cr         = CHR(13) .
    lf         = CHR(10) .
    letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
    digit      = "0123456789" .
    instring   = ANY - "'" - cr - lf - CHR(0) .

  COMMENTS FROM "{" TO "}"
  COMMENTS FROM "#" TO lf

  TOKENS
    identifier = letter { letter | digit } .
    number     = digit { digit } .
    string     = "'" (instring | "''") { instring | "''" } "'" .

  PRODUCTIONS
    ASSEM             = "ASSEM" Block "." .
    Block             = { ConstDeclarations | VarDeclarations }
                        "BEGIN" StatementSequence "END" .
    ConstDeclarations = "CONST" OneConst { OneConst } .
    OneConst          = Ident "=" CExpression ";" .
    VarDeclarations   = ( "INT" | "BOOL" )  OneVar { "," OneVar } ";" .
    OneVar            = Ident [ UpperBound ] .
    UpperBound        = "[" CExpression "]" .
    CExpression       = CAndExp { "||" CAndExp } .
    CAndExp           = CRelExp { "&&" CRelExp } .
    CRelExp           = CAddExp [ RelOp CAddExp ] .
    CAddExp           = CMultExp { AddOp CMultExp } .
    CMultExp          = CUnaryExp { MulOp CUnaryExp } .
    CUnaryExp         = CFactor | "+" CUnaryExp | "-" CUnaryExp | "!" CUnaryExp .
    CFactor           = Ident | Number | "TRUE" | "FALSE" | "(" CExpression ")" .
    AddOp             = "+" | "-" .
    MulOp             = "*" | "%" | "/" .
    RelOp             = "=" | "<>" | "<" | "<=" | ">" | ">=" .
    Ident             = identifier .
    String            = string .
    Number            = number .

  /* this next part is effectively new */

    StatementSequence = { Statement } .
    Statement         = ( Label | OneWord | Branch | DSPorLIT | LValue | WriteString ) .
    Label             = Ident .

    OneWord           =  "ADD" | "AND" | "DUP" | "DVD" | "EQL" | "GEQ" | "GTR" |
                         "HLT" | "INB" | "IND" | "INN" | "INX" | "LEQ" | "LSS" |
                         "MMM" | "MUL" | "NEG" | "NEQ" | "NLN" | "NOP" | "NOT" |
                         "ORR" | "PPP" | "PRB" | "PRN" | "REM" | "STK" | "STO" |
                         "SUB" | "VAL" .

    DSPorLIT          = ( "DSP" | "LIT" ) CExpression .

    WriteString       = "PRS"  String .

    Branch            = ( "BRN" | "BZE" | "BAN" | "BOR" ) ( Number | Target ) .
    Target            = Ident .

    LValue            = "ADR" ( SignedNumber | Designator ) .
    SignedNumber      = [ "+" | "-" ] Number .
    Designator        = Ident [ "[" CExpression "]" ] .

  END ASSEM.

In more detail - here are suggested additions to the code generator. This is slight overkill, because these routines could be replaced with simple calls to CGen->emit in the few places in the attributed grammar where they appear, and the error checking is really overcautious:

  void CGEN::twoword(char *mnemonic, int adr)
  { STKMC_opcodes opcode = Machine->opcode(mnemonic);
    if (opcode <= STKMC_prs) { emit(int(opcode)); emit(adr); }
    else Report->error(231);
  }

  void CGEN::oneword(char *mnemonic)
  { STKMC_opcodes opcode = Machine->opcode(mnemonic);
    if (opcode > STKMC_prs && opcode != STKMC_nul) emit(int(opcode));
    else Report->error(232);
  }

Here are the additions needed to the table handler - we need an option in the union for recording labels, which are characterised by their object code addresses and whether they have been declared or not. The address field here is used in a clever way later on to help set up a back-patch list for the situations where a label is referenced before it has been declared/defined.

  enum TABLE_idclasses { TABLE_consts, TABLE_vars, TABLE_progs, TABLE_labels };

  struct TABLE_entries {
    TABLE_alfa name;             // identifier
    TABLE_idclasses idclass;     // class
    TABLE_types type;
    int self;
    union {
      struct {
        int value;
      } c;                       // constants
      struct {
        int size, offset;
        bool scalar;
      } v;                       // variables
      struct {
        CGEN_labels address;
        bool defined;
      } l;                       // labels
    };
  };

Here are the additions needed to the compiler frame file - a few more error messages need to be defined:

  char *cs301Error::GetUserErrorMsg(int n)
  { switch (n) {
      ...
      case 229: return "Undefined labels";
      case 230: return "Redefined labels";
      case 231: return "Opcode requires addressfield";
      case 232: return "Opcode may not have addressfield";
      default:  return "Compiler error";
    }
  }

And here is the complete attribute grammar:

   COMPILER ASSEM $CNX
   /* Stack assembler level 1 grammar - this version for csC 301 exam solution 2001
      P.D. Terry, Rhodes University, 2001 */

   /* The first part of this is effectively identical to the same grammar used
      in the CS301 compiler.  A few lines have been deleted to tidy things up,
      but they could have remained in place with no difficulty */

   #include "misc.h"
   #include "set.h"
   #include "cgen.h"
   #include "table.h"
   #include "report.h"

   typedef Set<7> classset;
   typedef Set<4> typeset;
   bool debug;
   extern TABLE *Table;
   extern CGEN *CGen;
   extern REPORT *Report;  // needed for debugging pragma

   typeset arithtypes, booltypes;

   /*--------------------------------------------------------------------------*/

   IGNORE CASE
   IGNORE CHR(9) .. CHR(13)

   PRAGMAS
     DebugOn    = "$D+" .          (. Report->debugging = true; .)
     DebugOff   = "$D-" .          (. Report->debugging = false; .)

   CHARACTERS
     cr         = CHR(13) .
     lf         = CHR(10) .
     letter     = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" .
     digit      = "0123456789" .
     instring   = ANY - "'" - cr - lf - CHR(0) .

   COMMENTS FROM "{" TO "}"
   COMMENTS FROM "#" TO lf          /* ========= addition to original */

   TOKENS
     identifier = letter { letter | digit } .
     number     = digit { digit } .
     string     = "'" (instring | "''") { instring | "''" } "'" .

   PRODUCTIONS
     Block
     =                             (. int framesize = 0; .)
        SYNC { (  ConstDeclarations | VarDeclarations<framesize> )
        SYNC }                     (. /* reserve space for variables */
                                      CGen->openstackframe(framesize); .)
        "BEGIN" StatementSequence  (. /* ======  CGen->leaveprogram(); can be deleted */
                                      if (Report->debugging) Table->printtable(stdout); .)
        "END" .

     ConstDeclarations
     =  "CONST" OneConst { OneConst } .

     OneConst
     =                             (. TABLE_entries entry; .)
        Ident<entry.name>          (. entry.idclass = TABLE_consts; .)
        WEAK "="
        CExpression<entry.type, entry.c.value>
        ";"                        (. Table->enter(entry); .) .

     VarDeclarations<int &framesize>
     =                             (. TABLE_types t; .)
       (   "INT"                   (. t = TABLE_ints; .)
         | "BOOL"                  (. t = TABLE_bools; .)
       )  OneVar<framesize, t> { WEAK "," OneVar<framesize, t> } ";" .

     OneVar<int &framesize, TABLE_types t>
     =                             (. TABLE_entries entry; .)
                                   (. entry.idclass = TABLE_vars;
                                      entry.v.size = 1; entry.v.scalar = true;
                                      entry.type = t;
                                      entry.v.offset = framesize + 1; .)
        Ident<entry.name>
        [ UpperBound<entry.v.size> (. entry.v.scalar = false; .)
        ]                          (. Table->enter(entry);
                                      framesize += entry.v.size; .) .

     UpperBound<int &size>
     =                             (. TABLE_types type; .)
       "[" CExpression<type, size> (. if (!arithtypes.memb(type)) SemError(226);
                                      size++; .)
       "]" .

     CExpression<TABLE_types &e, int &value>
     =                             (. TABLE_types a;
                                      int aval; .)
        CAndExp<e, value>
        { "||" CAndExp<a, aval>    (. if (!(booltypes.memb(e) && booltypes.memb(a)))
                                        { SemError(218); e = TABLE_none; }
                                      else e = TABLE_bools;
                                      value = value || aval; .)
        } .

     CAndExp<TABLE_types &a, int &value>
     =                             (. TABLE_types e;
                                      int eval; .)
        CRelExp<a, value>
        { "&&" CRelExp<e, eval>    (. if (!(booltypes.memb(a) && booltypes.memb(e)))
                                        { SemError(218); a = TABLE_none; }
                                      else a = TABLE_bools;
                                      value = value && eval; .)
        } .

     CRelExp<TABLE_types &r, int &value>
     =                             (. TABLE_types a;
                                      int aval;
                                      CGEN_operators op; .)
        CAddExp<r, value>
        [ RelOp<op>
          CAddExp<a, aval>         (. if (r == TABLE_bools || a == TABLE_bools) SemError(218);
                                      r = TABLE_bools;
                                      switch (op) {
                                        case CGEN_opeql :
                                          value = value == aval; break;
                                        case CGEN_opneq :
                                          value = value != aval; break;
                                        case CGEN_oplss :
                                          value = value <  aval; break;
                                        case CGEN_opleq :
                                          value = value <= aval; break;
                                        case CGEN_opgtr :
                                          value = value >  aval; break;
                                        case CGEN_opgeq :
                                          value = value >= aval; break;
                                        default : break;
                                      } .)
        ] .

     CAddExp<TABLE_types &a, int &value>
     =                             (. TABLE_types m;
                                      int mval;
                                      CGEN_operators op; .)
        CMultExp<a, value>
        { AddOp<op>
          CMultExp<m, mval>        (. if (!(arithtypes.memb(a) && arithtypes.memb(m)))
                                        { SemError(218); a = TABLE_none; }
                                      else switch (op) {
                                        case CGEN_opadd :
                                          value += mval; break;
                                        case CGEN_opsub :
                                          value -= mval; break;
                                        default : break;
                                      } .)
        } .

     CMultExp<TABLE_types &m, int &value>
     =                             (. TABLE_types u;
                                      int uval;
                                      CGEN_operators op; .)
        CUnaryExp<m, value>
        { MulOp<op>
          CUnaryExp<u, uval>       (. if (!(arithtypes.memb(m) && arithtypes.memb(u)))
                                        { SemError(218); m = TABLE_none; }
                                      else switch (op) {
                                        case CGEN_opmul :
                                          value *= uval; break;
                                        case CGEN_opmod :
                                          if (uval != 0) value %= uval; else SemError(224);
                                          break;
                                        case CGEN_opdvd :
                                          if (uval != 0) value /= uval; else SemError(224);
                                          break;
                                        default : break;
                                      } .)
        } .


     CUnaryExp<TABLE_types &u, int &value>
     =    CFactor<u, value>
        | "+" CUnaryExp<u, value>  (. if (!arithtypes.memb(u)) {
                                        SemError(218); u = TABLE_none; } .)
        | "-" CUnaryExp<u, value>  (. if (!arithtypes.memb(u)) {
                                        SemError(218); u = TABLE_none; }
                                        value = - value; .)
        | "!" CUnaryExp<u, value>  (. if (!booltypes.memb(u)) SemError(218);
                                        value = ! value;
                                        u = TABLE_bools; .) .

     CFactor<TABLE_types &f, int &value>
     =                             (. TABLE_entries entry;
                                      TABLE_alfa name;
                                      bool found;
                                      value = 1; // play safe .)
          Ident<name>              (. Table->search(name, entry, found);
                                      if (!found)
                                      { SemError(202); /* force declaration */
                                        strcpy(entry.name, name);
                                        entry.type = TABLE_none;
                                        entry.idclass = TABLE_consts;
                                        entry.c.value = 1;
                                        Table->enter(entry);
                                      }
                                      if (entry.idclass == TABLE_consts) value = entry.c.value;
                                      else SemError(206);
                                      f = entry.type; .)
        | Number<value>            (. f = TABLE_ints; .)
        | "TRUE"                   (. f = TABLE_bools; value = 1; .)
        | "FALSE"                  (. f = TABLE_bools; value = 0; .)
        | "SIZE" "("               /* ========== this small addition needed == */
           Ident<name>             (. Table->search(name, entry, found);
                                      if (!found)
                                      { SemError(202); /* force declaration */
                                        strcpy(entry.name, name);
                                        entry.type = TABLE_none;
                                        entry.idclass = TABLE_vars;
                                        entry.v.scalar = true;
                                        entry.v.size = 1;
                                        entry.v.offset = 0;
                                      }
                                      if (entry.idclass == TABLE_vars) value = entry.v.size;
                                      else SemError(206);
                                      f = TABLE_ints; .)
           ")"
        | "(" CExpression<f, value> ")" .

     AddOp<CGEN_operators &op>
     =                             (. op = CGEN_opnop; .)
       (   "+"                     (. op = CGEN_opadd; .)
         | "-"                     (. op = CGEN_opsub; .)
       ) .


     MulOp<CGEN_operators &op>
     =                             (. op = CGEN_opnop; .)
       (   "*"                     (. op = CGEN_opmul; .)
         | "%"                     (. op = CGEN_opmod; .)
         | "/"                     (. op = CGEN_opdvd; .)
       ) .


     RelOp<CGEN_operators &op>
     =                             (. op = CGEN_opnop; .)
       (   "="                     (. op = CGEN_opeql; .)
         | "<>"                    (. op = CGEN_opneq; .)
         | "<"                     (. op = CGEN_oplss; .)
         | "<="                    (. op = CGEN_opleq; .)
         | ">"                     (. op = CGEN_opgtr; .)
         | ">="                    (. op = CGEN_opgeq; .)
       ) .


     Ident<char *name>
     =  identifier                 (. LexName(name, TABLE_alfalength); .) .


     String<char *str>
     =  string                     (. char local[100];
                                      LexString(local, sizeof(local) - 1);
                                      int i = 0;
                                      while (local[i]) /* strip quotes */
                                      { local[i] = local[i+1]; i++; }
                                      local[i-2] = '\0';
                                      i = 0;
                                      while (local[i]) /* find internal quotes */
                                      { if (local[i] == '\'') /* single quote */
                                        { int j = i;
                                          while (local[j])
                                          { local[j] = local[j+1]; j++; }
                                        }
                                        i++;
                                      }
                                      strcpy(str, local); .) .

     Number <int &num>
     =  number                     (. char str[100];
                                      int i = 0, l, digit, overflow = 0;
                                      num = 0;
                                      LexString(str, sizeof(str) - 1);
                                      l = strlen(str);
                                      while (i <= l && isdigit(str[i]))
                                      { digit = str[i] - '0'; i++;
                                        if (num <= (maxint - digit) / 10)
                                          num = 10 * num + digit;
                                        else overflow = 1;
                                      }
                                      if (overflow) SemError(200); .) .

   /* The productions below are specific to this application */

     ASSEM
     =                             (. arithtypes = typeset(TABLE_none, TABLE_ints);
                                      booltypes  = typeset(TABLE_none, TABLE_bools); .)
        "ASSEM" Block "."          (. Table->checklabels(stdout); .) .

     StatementSequence
     =  { Statement } .

     Statement
     =
        SYNC (   OneWord
               | Branch
               | DSPorLIT
               | LValue
               | WriteString
               | Label
             ) .

     Label
     =                             (. TABLE_alfa name;
                                      TABLE_entries entry;
                                      bool found; .)
        Ident<name>                (. Table->search(name, entry, found);
                                      if (!found) {     // new label to be defined
                                        strcpy(entry.name, name);
                                        entry.type = TABLE_none;
                                        entry.idclass = TABLE_labels;
                                        CGen->storelabel(entry.l.address);
                                        entry.l.defined = true;
                                        Table->enter(entry);
                                      }
                                      else {            // only now defined but had forward reference
                                        if (entry.idclass != TABLE_labels) SemError(206);
                                        else {          // must not already been defined
                                          if (entry.l.defined) SemError(230);
                                          else {        // fix the forward references
                                            CGen->backpatchlist(entry.l.address);
                                                        // and update the symbol table
                                            CGen->storelabel(entry.l.address);
                                            entry.l.defined = true;
                                            Table->update(entry);
                                          }
                                        }
                                      }
                                   .) .

     OneWord
     =  ( "ADD" | "AND" | "DUP" | "DVD" | "EQL" | "GEQ" | "GTR" | "HLT" | "INB" |
          "IND" | "INN" | "INX" | "LEQ" | "LSS" | "MMM" | "MUL" | "NEG" | "NEQ" |
          "NLN" | "NOP" | "NOT" | "ORR" | "PPP" | "PRB" | "PRN" | "REM" | "STK" |
          "STO" | "SUB" | "VAL" )
                                   (. char mnemonic[4];
                                      LexName(mnemonic, sizeof(mnemonic) - 1);
                                      CGen->oneword(mnemonic); .) .


     DSPorLIT
     =                             (. char mnemonic[4];
                                      TABLE_types type;
                                      int value; .)
        ( "DSP" | "LIT" )          (. LexName(mnemonic, sizeof(mnemonic) - 1); .)
        CExpression<type, value>   (. CGen->twoword(mnemonic, value); .) .

     WriteString
     =                             (. char str[600];
                                      CGEN_labels startstring; .)
        "PRS" String<str>          (. CGen->stackstring(str, startstring);
                                      CGen->writestring(startstring); .) .


     Branch
     =  ( "BRN" | "BZE" | "BAN" | "BOR" )
                                   (. int destination;
                                      char mnemonic[4];
                                      LexName(mnemonic, sizeof(mnemonic) - 1); .)
        (   Number<destination>
          | Target<destination>
        )                          (. CGen->twoword(mnemonic, destination); .) .


     Target<int &destination>
     =                             (. TABLE_alfa name;
                                      TABLE_entries entry;
                                      bool found; .)
        Ident<name>                (. Table->search(name, entry, found);
                                      if (found) {
                                        if (entry.idclass == TABLE_labels) {
                                          destination = entry.l.address;
                                          if (!entry.l.defined) {
                                                        // another forward reference
                                            CGen->storelabel(entry.l.address);
                                            Table->update(entry);
                                          }
                                        }
                                        else SemError(206);
                                      }
                                      else {            // first forward reference
                                        strcpy(entry.name, name);
                                        entry.type = TABLE_none;
                                        entry.idclass = TABLE_labels;
                                        CGen->storelabel(entry.l.address);
                                        entry.l.defined = false;
                                        destination = CGen->undefined;
                                        Table->enter(entry);
                                      }
                                   .) .


     LValue
     =                             (. int value, offset; .)
        "ADR"
        (   SignedNumber<value>    (. CGen->twoword("ADR", value); .)
          | Designator<offset>     (. CGen->twoword("ADR", -offset); .)
        ) .

     SignedNumber<int &value>
     =                             (. int sign = 1 .)
        [   "+"
          | "-"                    (. sign = -1; .)
        ] Number<value>            (. value = sign * value; .) .


   /* This next one is very similar to the designator used in the CS301 compiler
      but needs to return the offset address of the variable or array element */

     Designator<int &offset>
     =                             (. TABLE_alfa name;
                                      TABLE_types exptype;
                                      TABLE_entries entry;
                                      bool found;
                                      int value; .)
        Ident<name>                (. Table->search(name, entry, found);
                                      if (!found)
                                      { SemError(202);  // force declaration
                                        strcpy(entry.name, name);
                                        entry.type = TABLE_none;
                                        entry.idclass = TABLE_vars;
                                        entry.v.scalar = true;
                                        entry.v.size = 1;
                                        entry.v.offset = 0;
                                      }
                                      if (entry.idclass == TABLE_vars) offset = entry.v.offset;
                                      else { SemError(206); offset = 0; return; } .)
        ( "["                      (. if (!found) { entry.v.scalar = false; Table->enter(entry); }
                                      if (entry.v.scalar) SemError(204); .)
          CExpression<exptype, value>
                                   (. if (!arithtypes.memb(exptype)) SemError(227);
                                      offset += value;
                                      if (value < 0 || value >= entry.v.size)
                                        SemError(223); .)
          "]"
          |                        (. if (!found) Table->enter(entry); .)
        ) .

   END ASSEM.


Home  © P.D. Terry