Tut 21 Solutions

Computer Science 301 - 2001

Tutorial for week 21 - Solutions - Code generation for a CASE statement

In earlier discussions we have suggested adding a case/switch statement to the CS301 language, with syntax modelled on that found in Modula-2:

  CaseStatement = "CASE" Expression "OF"
                    OneCase { "|" OneCase }
                    "DEFAULT" ":" StatementSequence
                  "END" .
  OneCase       = [ Range { "," Range } ":" StatementSequence ] .
  Range         = IntConst [ ".." IntConst ] .
  IntConst      = [ "+" | "-" ] number .

Is there any need to restrict the labels to the form suggested above?

The statement can be made more general if we change the grammar to

  CaseStatement   = "CASE" Expression "OF"
                      OneCase { "|" OneCase }
                      "DEFAULT" ":" StatementSequence
                    "END" .
  OneCase         = [ Range { "," Range } ":" StatementSequence ] .
  Range           = ConstExpression [ ".." ConstExpression] .
  ConstExpression = Expression .

Here, of course, while ConstExpression is syntactically equivalent to a general expression, semantically we have to check that its factors are all constants, and we have to evaluate it at compile time rather than generate code that will only evaluate it at run time.

Discuss how one would perform semantic checking and generate code for such statements for our stack machine.

We refrain from giving a complete solution here, because that would spoil the fun as you fill in the details for yourselves as part of Practical 21. But here are some suggestions, based on a simple example of such a statement:

    CASE I OF
        1 :      Write('One')
      | 5 .. 8 : Write('Five to eight')
      | 4 :      Write('Four');
      DEFAULT :  Write('Not 1, 4, 5, 6, 7, 8')
    END

The semantic checks that we shall need are:

Ensure that the Type of the selector Expression and the types of all the ConstExpressions are the same.
No label may implicitly or explicitly be attached to more than one option. Put another way: the labels for each option can be thought of as elements of sets, and the pairwise intersections of these sets must be empty (Hey, man, cool! That's kind of like the LL(1) conditions all over again!)

To carry out the checks on the uniqueness of each label we can follow the approach suggested in Practical 19, and construct for each CASE statement a list of labels, perhaps using a generic list handling class (see solution to Practical 19). However, if we are to use this in conjunction with code generation we shall have to have list elements that are more complex than single integers - we need to keep track not only of the value of each label option as it has appeared in the source code, but also of the corresponding place in the object code where the first of the statements associated with that option is to be found.

A suitable structure might be that shown here. This has two data fields, and has also had to overload the == and != operators upon which the position and isMember methods of the List template crucially depend:

  struct caseLabel {
    int         opt; // the label as it appears in the source code
    CGEN_labels lab; // the label of the first statement in the object code
    friend int operator == (caseLabel x, caseLabel y)
      { return x.opt == y.opt; }
    friend int operator != (caseLabel x, caseLabel y)
      { return x.opt != y.opt; }
  };

  typedef List<caseLabel> LabelList;

To understand the code generation, we start by pointing out that each arm (statement sequence) of the CASE statement has an implicit "break" (in C++ terminology) added. Each of these must branch to the first statement past the end of the CASE statement itself. In the case of the example above it is as though we really had

    CASE I OF
        1 :      Write('One');
                 goto CONTINUE;
      | 5 .. 8 : Write('Five to eight');
                 goto CONTINUE;
      | 4 :      Write('Four');
                 goto CONTINUE;
      DEFAULT :  Write('Not 1, 4, 5, 6, 7, 8');
                 goto CONTINUE;
    END;
    CONTINUE : { code following the CASE statement begins here }

Generating these unconditional branch instructions is, of course, complicated by the fact that at the (known) points where we wish to do so (the end of the OneCase subparser) we do not know yet where the CONTINUE point will be. This is a classic case of having to use a back patching technique. Some suggestions for how to do this in the situation where there are likely to be multiple backpatches needed are given in the text in Exercise 15.8 at the end of section 15.1.

For the example above we might end up with code for the inner statements of the CASE statement something like

        10 PRS   'One'
        12 BRN   38
        14 PRS   'Five to eight'
        16 BRN   38
        18 PRS   'Four'
        20 BRN   38
        22 PRS   'Not 1, 4, 5, 6, 7, 8'
        24 BRN   38
           ...
        38 ...

Of course those "break" statements are but one part of the story. How, after evaluating the selector expression at run time, is the system to decide where to branch to find the start of the appropriate "arm" - that is (in the above example), to know whether to set CPU.PC to 10, 14, 18 or 22?

In general this is a very complicated problem to solve, and the sort of code that we might generate will depend on all sorts of factors.

As we analyse a CASE statement, and in particular parse the Range of each arm, we might build up a list of caseLabel entries like this

        opt     lab

        1       10
        5       14
        6       14
        7       14
        8       14
        4       18

Clearly this list could get enormous - for example if we had a CASE statement like

    CASE I OF
           1 .. 1000 : Write('First Millenium');
      | 1001 .. 2000 : Write('Second Millenium');
      DEFAULT :
    END;

For this exercise it will suffice to assume that the user-defined case labels span a rather small set of numbers (as they do in the first example).

There are at least two ways in which such a list, once created, could be of use to generate fairly straightforward selection. Each of them relies on creating a new machine instruction for the interpreter.

In the first method we generate a complex multi-word instruction of the form

     SWI  NumberOfOptions  Opt1 Lab1 Opt2 Lab2 Opt3 Lab3 ... DefaultLab

with the idea that the interpreter will search through the list of the known NumberOfOptions, seeking the value of Opt_j that matches the run-time value of the selector expression. If such a match can be found the search terminates and the CPU.PC register is set to the matching Lab_j and execution continues from there. If no match can be found, CPU.PC is set to DefaultLab and execution continues from there. For our example, the actual code generated would be

     SWI 6  1  10  5  14  6  14  7  14  8  14  4  18  22

In the second method we generate a complex multi-word instruction of the form

     SEL  MinOpt MaxOpt DefaultLabel Lab1 Lab2 Lab3 ... LabN

where the list of Lab_i is derived to contain the object code addresses for each of the values in the range from the lowest possible value of the selector to the highest possible value of the selector expression. This list will have to contain entries set to DefaultLabel for any "missing" values that lie in this range. The interpretation of the SEL opcode is then very efficiently to make a branch on the lines of

     IF SelectorExpression < MinOpt or SelectorExpression > MaxOpt
       THEN CPU.PC := DefaultLabel
       ELSE CPU.PC := List[SelectorExpression - MinOpt]

For our example, the actual code generated would be

        SEL  1  8  22  10  22  22  18  14  14  14  14

Generating either form of code is straighforward enough, once the list of case labels has been completely built up. The second method results in faster object code, as there is no run time search needed. You might like to decide which method leads to the smaller amount of code generated, and decide at compile time which method to use. Fairly obviously, these techniques could result in enormous memory requirements (in either case) if the ranges of case labels are excessive - in such a case a completely different strategy might be called for which we shall not go into here.

Of course the SWI or SEL code cannot be generated at the "top" of the CASE statement, though that is where it is conceptually needed. To handle this situation it is suggested that you generate a forward branch to an undefined address, and then backpatch this to point to the location of the SWI or SEL instruction when you get around to generating it.

Finally, complete code for the example above might thus look like this:

         8 BRN   26                             ; forward to SEL opcode
        10 PRS   'One'                          ; 1 :
        12 BRN   38                             ; break
        14 PRS   'Five to eight'                : 5 .. 8 :
        16 BRN   38                             ; break
        18 PRS   'Four'                         ; 4 :
        20 BRN   38                             ; break
        22 PRS   'Not 1, 4, 5, 6, 7, 8'         ; default point
        24 BRN   38                             ; break
        26 SEL   1 8 22 10 22 22 18 14 14 14 14 ; uses mem[26] .. mem[37]
        38 ...