11 NON-NUMERICAL SCALAR TYPES

Extract from "An Introduction to Programming with Modula-2" (Terry, 1986).

At several points in the previous chapters we have suggested that there may be a need to deal with data other than the simple REAL, CARDINAL and INTEGER numerical data which have been used almost exclusively until now. Modula-2 is one of very few languages which allows this to be done easily, introducing, as it does, some decidedly non-numerical, but still ordinal, scalar types. These are the user-defined "enumerated scalars", and the standard type BOOLEAN.

11.1 Enumerated scalars

In section 10.1 we saw how we could define our own explicitly named types by means of TypeDeclarations. The main concern there was in declaring array types, and subrange types for indexing these. The fundamental scalar types REAL, CARDINAL, INTEGER and CHAR are said to be predefined standard types. In many programs one is tempted to use these types, and especially CARDINAL, to represent data which is not really alphanumeric at all. For example, in dealing with days of the week and months of the year, a programmer might be tempted to declare

      VAR
        Weekday, Month :  CARDINAL;

and then restrict the program to using the values 1 through 7 for Weekday, and 1 through 12 for Month. This might be improved by declaring

      VAR
        Weekday : CARDINAL [1 .. 7];
        Month   : CARDINAL [1 .. 12];

and it might be argued that days and months have an obvious numerical correspondence. (Or do they? Is Sunday the first day of the week, or is it Monday?)

In other applications any numerical correspondence is more arbitrary. For example, in dealing with car hire firms one might be tempted to use

      VAR
        Firm : CARDINAL;

but which Firm is really Number 1? And which number means "tries harder"? Then again, after hiring a car one usually finds that it can start, stop, go forward, backwards, up hill and down dale, and turn left and right. Which are the best numbers for describing each of a set of directions?

In Modula-2 it is possible to introduce a so-called EnumeratedType, simply by quoting a list of identifiers which represent the possible values which objects of that type may have. This may be described by the syntax diagram

EnumeratedType

    ---------------> ( ---.--> ConstantIdentifier ---.---> ) ------>
                          |                          |
                          `--------- , <-------------'

TypeDeclarations which incorporate the EnumeratedType are exemplified by

      TYPE
        DAYS    = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
        SUITS   = (Clubs, Diamonds, Hearts, Spades);
        FORDS   = (Escort, Cortina, Fiesta, Anglia);
        BEATLES = (John, Paul, George, Ringo);
        COLOURS = (Red, Orange, Yellow, Green, Blue, Indigo, Violet);

The way in which the list is written also imposes an order on these constants, and sometimes this is useful - the above examples imply that

              Monday < Tuesday < . . . . . < Saturday < Sunday
              John < Paul < George < Ringo

Variables may be declared to be of these new types, in the usual way, for example

      VAR
        Day, Holiday : DAYS;
        PopStar      : BEATLES;
        Car, Wreck   : FORDS;
        Trump        : SUITS;
        Year         : ARRAY [1 .. 365] OF DAYS;

Alternatively, an anonymous EnumeratedType may be defined directly in the VariableDeclarations, for example

      VAR
        Day, Holiday : (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
        Instrument   : (Drum, Guitar, Mouthorgan, Bass);
        CarsOnShow   : ARRAY [1 .. 10] OF (Escort, Cortina, Fiesta, Anglia);

but the former way is to be preferred, especially when variables and parameters of a given type are to be declared in several different procedures. As in the case of named array types, explicitly named enumerated types strengthen the abstraction process.

It should be emphasized that a ConstantIdentifier must be of a unique type in any one block. In the case of the EnumeratedType this means, in particular, that it is not possible to select "subsets" and write, for example

     TYPE
       RAINBOWCOLOURS = (Red, Orange, Yellow, Green, Blue, Indigo, Violet);
       PRIMARYCOLOURS = (Red, Yellow, Blue);
     VAR
       BrassBandInstrument : (Trombone, Trumpet, Tuba, SideDrum, BassDrum);
       PipeBandInstrument  : (BassDrum, SideDrum, TenorDrum, Bagpipe);
       VeryNoisyInstrument : (Trombone, Trumpet, Tuba);

One could, however, have written

      TYPE
        RAINBOWCOLOURS = (Red, Yellow, Blue, Green, Indigo, Violet, Orange);
        PRIMARYCOLOURS = RAINBOWCOLOURS [ Red .. Blue ] (*subrange*);

only this second case has lost the ordering for RAINBOWCOLOURS which ties in with physical reality.

Notice that the list is one of ConstantIdentifiers, and not literal constants or ConstantExpressions. One might be tempted to make declarations like

      TYPE
        SIGNS    = ( '+' , '-' , '*' , '/' )              (*++ illegal ++*);
        BRACKETS = ( '(' , ')' , '[' , ']' , '{' , '}' )  (*++ illegal ++*);

but this is not allowed.

The EnumeratedType is another OrdinalType, and we may now present a set of diagrams describing all the variations on scalar types.

ScalarType

       ----------.------> OrdinalType ------.
                 |                          |
                 `--------> REAL -----------`----------------------------->

OrdinalType

       ----------.--> OrdinalTypeIdentifier ---.
                 |                             |
                 `-----> EnumeratedType -------|
                 |                             |
                 `------> SubrangeType --------`----------------------------->

SubrangeType

       |
       `-------------> OrdinalTypeIdentifier ------------.
       |                                                 |
       |<-----------------------<------------------------'
       |
       `-> [ -> OrdinalConstantExpression ---> .. ---> OrdinalConstantExpression ---> ] -->

Subranges of an EnumeratedType are quite common, and a complete type or a derived subrange may be used as the IndexType for an ArrayType. For example, one might declare

      TYPE
        GUITARISTS = BEATLES [John .. George];
        WEEKDAYS   = DAYS [Monday .. Friday];
      VAR
        Vocalist        : GUITARISTS;
        Payday, Workday : WEEKDAYS;
        Salary          : ARRAY BEATLES OF REAL;

An EnumeratedTypeIdentifier may be exported from a module. In this case, all of the ConstantIdentifiers mentioned in its definition are automatically exported from that module as well.

So much for the declaration process; how is use made of these types? As might be expected, objects of EnumeratedType may be passed as parameters, and used in simple assignments, provided the rules for compatibility are obeyed. Given the above declarations it is legal to write

             Day := Sunday;
             PopStar := Paul

and, since the constants are ordered, comparisons may be made using relational operators, for example

             IF (Day = Saturday) OR (Day = Sunday)
               THEN RateOfPay := 2 ELSE RateOfPay := 1
             END

or, more simply, given the ordering imposed by letting Monday be the "first" day in the week

             IF Day > Friday THEN RateOfPay := 2 ELSE RateOfPay := 1 END

They may also be used to control CASE statements, FOR loops, and as array subscripts:

             CASE PopStar OF
               John, George : Instrument := Guitar  |
               Paul         : Instrument := Bass    |
               Ringo        : Instrument := Drum
             END (*case*);
             FOR PopStar := John TO Ringo DO Salary[PopStar] := 1.4E23 END

However, types may not be mixed, nor arithmetic performed, so that it is illegal to write

             Day := Paul;
             Day := 3;
             PopStar := Drum + Ringo;

The sort of "arithmetic" one would most want to do would be to increment or decrement the value of a variable - in effect, to move from one value to another one - and this is made possible with the INC and DEC operations that have already been mentioned for INTEGER and CARDINAL variables. For example, code like

             Vocalist := John; INC(Vocalist);
             Day := Sunday; DEC(Day, 2)

leaves Vocalist with the value of Paul, and Day with the value of Friday.

Although at this stage the EnumeratedType may seem little more than a luxury or curiosity, considerable use will be made of it in future examples where a number of options exist in a given situation, possibly not associated with any apparent order.

As an example, we return to the list searching problem discussed in the last chapter. In searching a list for a particular item a program can be in only one of three "states", namely "still looking", "found it" and "realize there is no hope of finding it". This approach can be followed in developing a procedure like

    TYPE POSSIBILITIES = (Absent, Found, Looking);

    PROCEDURE Search (Item : CARDINAL; List : ARRAY OF CARDINAL;
                      VAR Result : POSSIBILITIES);
    (* Search for 'Item' in 'List', return 'Result' one of (Absent, Found) *);
      VAR
        I : CARDINAL;
      BEGIN
        Result := Looking;  I := 0;
        REPEAT
          IF Item = List[I]
            THEN  Result := Found
            ELSIF I # HIGH(List) THEN INC(I)
            ELSE  Result := Absent
          END
        UNTIL Result # Looking
      END Search;

A little thought will probably lead most readers to the conclusion that Modula-2 systems, given declarations like

      TYPE
        STUDENTS = (FirstYear, SecondYear, ThirdYear, Honours);
      VAR
        University : ARRAY [1 .. 3500] OF STUDENTS;

will treat them in much the same way as they would

      CONST
        FirstYear = 0; SecondYear = 1;
        ThirdYear = 2; Honours = 3;
      VAR
        University :  ARRAY [1 .. 3500] OF CARDINAL;

and that the effect of passing from one year to the next, that is

INC(University[I])

is equivalent to

University[I] := University[I] + 1

This is undoubtedly true, and one may wonder at all the fuss - especially as in many other computer languages one has to use exactly this second approach. In defence of the use of enumerated scalar types, we should point out that they discourage the use of such obscure statements as

IF University[I] = 0 THEN (*something*) . . .

in place of

IF University[I] = FirstYear THEN (*something*) . . .

and, while the second set of declarations would allow

University[I] := 352 + 47

this will be forbidden by the first. Of course you would not dream of doing something so stupid, but it could be very difficult to find that sort of mistake in a program written by someone else, especially when there a deadline to meet, and that someone else has long since left the organization!

Clearly occasions will arise when it is necessary to perform the equivalent of arithmetic on variables of these types, or, as is more likely, when they have to be related to variables of other types. For such purposes one may resort to using type conversion functions, as discussed in section 4.9 for handling mixed- mode arithmetic. Enumerated scalars are most likely to be related to CARDINAL quantities. The standard function ORD will map enumerated objects to an equivalent CARDINAL value, and the inverse standard function VAL will map a CardinalExpression to an EnumeratedType. The first identifier in an enumeration maps onto a cardinal value of zero. One may also apply the functions MIN and MAX to enumerated scalars to find the lowest and highest in the list. Thus, given the declaration

      TYPE
        Beatles = (John, Paul, George, Ringo);

             MIN(BEATLES)       returns     John
             MAX(BEATLES)       returns     Ringo
             ORD(George)        returns     2
             VAL(BEATLES , 0)   returns     John

Beginners may at first be attracted to the idea of using enumerated scalars, and then rapidly discouraged from using them for a rather curious reason. There is a great tendency to confuse their values with their names, that is, to regard the values as strings of letters, rather than as hidden numbers. This is exemplified by an expectation that one should be able to use code like

WriteString (WeekDay)

and be presented with output of the form "Friday" or "Monday" or whatever. Alas, using only the features directly available from the EasyInOut library, it is not possible to read or write the "names" of an EnumeratedType directly.

For output one can, of course, resort to code like

WriteCard(ORD(scalar), 4)

(which prints out an CARDINAL instead) and then refer back to the declarations to work out which scalar was meant. This is useful only in emergencies; better schemes, if a little long winded, might make use of a CASE statement like

             CASE Trump OF
               Clubs    : WriteString('Clubs')    |
               Diamonds : WriteString('Diamonds') |
               Hearts   : WriteString('Hearts')   |
               Spades   : WriteString('Spades')
             END (*case*)

Reading values for such variables is even more awkward, and one usually has to resort to the case construction. For a small list of values this is still tolerable, but for a large list it becomes tedious. Standard practice when things become tedious is to try to devise a useful procedure which can be placed in a library, but even this is of limited application here, as enumerated scalar values are named quite arbitrarily.

As a rather pointless example, suppose one had defined a DAYS type as before, and the need arose to assign values of this type to elements of an array whose elements represented the day of the week for any day in the year. Given that one Day is known, determining the "next" day is easy - the code for this is INC(Day), unless Day is currently Sunday (an attempt to INC(Sunday) should cause an execution error). This may be more safely written in terms of the MIN and MAX functions, as illustrated below. Determining New Year's Day can almost be done by reading a single character, but not quite:

    MODULE AssignDays;
    (* Assigns days of the week to all days of the year after reading
       two chars to assign the first day to Sunday .. Saturday respectively *)

      FROM EasyInOut IMPORT Read;

      TYPE
        DAYNUMBERS = CARDINAL [1 .. 365];
        DAYS = (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday);
      VAR
        Day     : DAYS;
        I       : DAYNUMBERS;
        FirstCH, SecondCH : CHAR;
        Year    : ARRAY DAYNUMBERS OF DAYS;
      BEGIN
        Read(FirstCH); Read(SecondCH);
        CASE FirstCH OF
            's', 'S' : IF (SecondCH = 'A') OR (SecondCH = 'a')
                         THEN Day := Saturday ELSE Day := Sunday END
          | 'm', 'M' : Day := Monday
          | 't', 'T' : IF (SecondCH = 'H') OR (SecondCH = 'h')
                         THEN Day := Thursday ELSE Day := Tuesday END
          | 'w', 'W' : Day := Wednesday
          | 'f', 'F' : Day := Friday
        END;
        FOR I := 1 TO 365 DO
          Year[I] := Day;
          IF Day = MAX(DAYS) THEN Day := MIN(DAYS) ELSE INC(Day) END
        END;
      (* +++ rest of code +++ *)
      END AssignDays.

An alternative way, which might appeal to some readers although it is less easily understood (and thus to be discouraged), is to perform type conversions and arithmetic along the lines of

             FOR I := 1 TO 365 DO
               Year[I] := Day; Day := VAL(DAYS, (ORD(Day) + 1) MOD 7) )
             END

Frequently all that is required is an easy conversion from single character codes to scalar types, when use may be made of a "mapping array". For example, after declaring

      TYPE
        MARITALSTATUS = (Single, Married, Widowed, Divorced);
      VAR
        Status : MARITALSTATUS;
        MAP    : ARRAY CHAR OF MARITALSTATUS;
        CH     : CHAR;

and initializing

MAP['S'] := Single; MAP['M'] := Married;
MAP['s'] := Single; MAP['m'] := Married (* and so on *)

a value for Status may be assigned by obeying the statements

Read(CH); Status := MAP[CH]

The same technique can be useful for writing simple scalars.