Practical 14 (discussion)

Computer Science 301 - 2001

Programming Language Translation

Practical for Week 14, beginning 23 July 2001 - solutions

The discussion of executable sizes was not carried right through by everyone, but most students seemed to get the idea. For the record, here are the sizes:

Executable sizes derived from sources shown:

                 Using a "set"                      Using an "array"

                                  EXE   OBJ                          EXE   OBJ
Turbo Pascal:    SIEVE.PAS       3504               SIEVE1.PAS      2960

JPI Modula-2     SIEVE.MOD      21282  2025         SIEVE1.MOD     21220  1960

Using cin and cout:

Borland C 3.1:   SIEVE.CPP      14932  1745         SIEVE1.CPP     14788  1463
Borland C 5.5:   SIEVE.CPP     149504  4546         SIEVE1.CPP    149504  4260
Microsoft C:     SIEVE.CPP      40960               SIEVE1.CPP     40960

Using scanf and printf:

Borland C 3.1:   SIEVE.CPP      10352  1558         SIEVE1.CPP     10192  1260
Borland C 5.5:   SIEVE.CPP      66560  1806         SIEVE1.CPP     66048  1527
Microsoft C:     SIEVE.CPP      37768               SIEVE1.CPP     37768

Using scanf and printf after translation from Modula-2 by X2C

Borland C 3.1:   SIEVE.C        30614  1417         SIEVE1.C       30630  1429
Borland C 5.5:   SIEVE.C        62976  1590         SIEVE1.C       62976  1583
Microsoft C:     SIEVE.C        45056               SIEVE1.C       45056

It is quite clear that the C++ compilers, certainly when using the iostream library, produce enormous code bloat. I am pretty sure that these differences are due mostly to the extent to which I/O library implementation is layered. To that end, I experimented further. Null programs like these:

    PROGRAM p;      MODULE p;     void main(void) {
    BEGIN           END p.        }
    END.

generated executable sizes of 1472 (Pascal), 1773 (Modula-2), 4204 (Borland 3.1 C++) and 47104 (Borland 5.5 C++).

I performed a further experiment, stripping out all the I/O statements from the "array" version of the sieve programs. Executable sizes were 1680 (Pascal), 1932 (Modula-2 with array range checks), 1874 (Modula-2 with no array range checks), 6378 (Borland 3.1 C++) and 47104 (Borland 5.5 C++). Clearly the Pascal and Modula-2 compilers are much tighter, and perform very well (only adding a few hundred bytes to the "null" program base sizes).

The Borland 5.5 and Microsoft C++ compilers are 32-bit ones, rather than 16-bit ones. But even allowing for this, they suffer from bizarre code bloat for small applications. There may be command line parameters and options that one can set to try to produce tighter code, but I have not bothered to experiment further. Some (most) of the overheads may relate to the fact that they have to produces "Windows" compatible programs. Compilers like this are clearly designed with an "everyone has 128MB of memory and 6GB of disk space, and if they don't they should go and buy more" philosophy.

Several students complained that the C code generated by the X2C system was "unreadable", and "not what we would have written ourselves". There can be no dispute with the second of those arguments, but if you take a careful look at the generated C code, it is verbose, rather than unreadable, because long identifier names have been used. This is actually not such a bad thing - at least one can tell the "origin" of any identifier, since the original module in which it was declared is incorporated into its name, and is a very useful style, both for large programs, and (more especially) when one has to contend with the miserable rules that C has for controlling identifier name spaces. In fact, in the days whe I used Modula regularly, I used a convention similar to this of my own accord, and carried it over into my C coding, as you will see from the source in the appendices to the book. Some of the other "unreadability" presumably relates to the fact that the X2C system is obliged to translate the CARDINAL type to the unsigned int type, which is not one many of you will ever have used - this explains all those funny casts and capital U's that you saw.

Similarly, the discussion of "limits on the size of the sieve" was not taken right through by many people. Several discovered that the Pascal "set" size is limited to 256 elements (which is very restrictive for this conceptually very useful data structure). The JPI Modula-2 compiler allows very large sets, because, like the set.h implementation, it represents sets by packing 8 elements to a byte. The Pascal, Modula and (I suspect) Borland 3.1 compilers had to work within an architectural limitation of earlier processors running under DOS that no data structure could exceed 64K bytes in size. While this gave the illusion that one could set up "array" based sieves of very nearly this size, in practice there is another nasty limitation, which some of you discovered and others probably missed. In 16 bit compilers the range of an int or INTEGER was limited to -32768 .. 32768 and an unsigned int (or CARDINAL in Modula-2 terminology) to 0 .. 65355. Attempts to drive the sieve larger are handicapped by this. In the case of the Pascal and Modula programs one soon gets "array index out of range" errors. In the case of the Borland 3.1 C++ system one gets what one deserves for driving without brakes - rubbish (at best) and a total collapse (at worst). On the other hand the 32 bit compilers allow very much bigger sieves as the int type has a range of about 4 billion, and the operating system and large memories in the computers support larger data structures. With a memory size of about 64M and an int size of 4 bytes I expect the system would handle arrays of up to about 8MB with no problem, and several students reported this (I am not brave or perhaps foolish enough to try pushing my own system to its limits using the inferior protection provided by C++ and Windows, sorry. I once lost a hard disk because a pointer went out of range unchecked. Not a nice situation to be in!) On the other hand, the limitation imposed by the Clang system is entirely due tot he fact that the interpreter system only allows for 8000 words of memory - which has to hold the pseudo-code and all variables. The limit on the sieve size was about 7900 I think. This could have been extended simply by modifiying the interpreter, and then recompiling it, but you have not seen how to do that (yet).

Code follows for the Sieve programs in C++ using scanf/printf:

Note that the program includes an exit(1) call to abort itself if it discovers that the input data is out of range. Develop the habit of doing this sort of thing and making these sorts of checks. C compilers don't (can't) generate code to apply array bound checks, so if you write array handling programs that go out of bounds you have only yourself to blame when spectacular disasters occur as soon as your buggy programs modify memory in odd places!




  // Sieve of Eratosthenes for finding primes 2 <= n <= 4000
  // P.D. Terry,  Rhodes University, 2001

  #include "misc.h"

  const int Max = 4000;
  typedef boolean SIEVES[Max - 1];

  void main(void)
  { int i, n, k; // counters
    long primes = 0;
    SIEVES Uncrossed;   // the sieve
    printf("Supply largest number to be tested ");
    scanf("%d", &n);
    if (n > Max) {
      printf("n too large, sorry");
      exit(1);
    }
    printf("Prime numbers between 2 and %d\n", n);
    printf("-----------------------------------\n");
    for (i = 2; i <= n; i++)
      Uncrossed[i-2] = true;    // clear sieve
    for (i = 2; i <= n; i++) {  // the passes over the sieve
      if (Uncrossed[i-2]) {
        if (primes % 10 == 0) putchar('\n'); // ensure line not too long
        primes++;
        printf("%5d", i);
        // now cross out multiples of i
        k = i;
        do {
          Uncrossed[k-2] = false;
          k += i;
        } while (k <= n);
      }
    }
    exit(0);
  }



  // Sieve of Eratosthenes for finding primes 2 <= n <= 4000
  // P.D. Terry,  Rhodes University, 2001

  #include "misc.h"
  #include "set.h"

  const int Max = 4000;
  typedef Set<Max> SIEVES;

  void main(void)
  { int i, n, k; // counters
    long primes = 0;
    SIEVES Uncrossed;   // the sieve
    printf("Supply largest number to be tested ");
    scanf("%d", &n);
    if (n > Max) {
      printf("n too large, sorry");
      exit(1);
    }
    printf("Prime numbers between 2 and %d\n", n);
    printf("-----------------------------------\n");
    Uncrossed.fill();
    for (i = 2; i <= n; i++) {  // the passes over the sieve
      if (Uncrossed.memb(i)) {
        if (primes % 10 == 0) putchar('\n'); // ensure line not too long
        primes++;
        printf("%5d", i);
        // now cross out multiples of i
        k = i;
        do {
          Uncrossed.excl(k);
          k += i;
        } while (k <= n);
      }
    }
    exit(0);
  }

The lucky number program was rather badly done by many people. For a start, you had been asked to write a function that passed the list back in an array parameter - some groups had simply hacked together a main routine only. The exercise was supposed to heighten your awareness of array handling and parameter passing, and so in that respect you may have lost out; in which case study the code below carefully.

Several solutions submitted produced lists with completely the wrong numbers, although perhaps they might have "looked" okay. Others did a huge amount of extra work, marking an array, then sweeping it all over again to extract numbers into a second array, then sweeping again to copy back into the original array.

Underneath every apparently complex problem there is often a simple solution. Seeking simplicity and order is what science is all about, so try to get into the habit. Unfortunately, the modern trend towards cut-and-paste interactive environments encourages a hack-the-code-any-old-how mentality!

  // Simple test bed program for checking on the generation of lucky numbers
  // P.D. Terry, Rhodes University, 2001

  #include "misc.h"
  #include <iostream.h>

  const int Max = 4000;
  typedef int LISTS[Max];

  void GenerateLuckyNumbers(int LuckyList[], int &Top)
  // Generate a LuckyList of the lucky numbers between 1 and Top (inclusive)
  { int I, Copied, Step = 2;

    if (Top > Max) Top = Max;       // check for potential disaster
    for (I = 1; I <= Top; I++)      // set up initial list
      LuckyList[I-1] = I;           
    while (Step <= Top) {           // sweep and copy down elements
      Copied = 0;
      for (I = 1; I <= Top; I++) {
        if (I % Step != 0) {        // only copy every Step-th element
          Copied++;                 // but count how many are copied
          LuckyList[Copied-1] = LuckyList[I-1];
        }
      }
      Top = Copied;                 // new limit on the length of list
      Step++;
    }
  }

  void main(void) {
    int I, Limit;
    LISTS TestList;

    cout << "Supply limit of numbers to be tested for luck ";
    cin >> Limit;
    GenerateLuckyNumbers(TestList, Limit);
    for (I = 1; I <= Limit; I++) {
      cout.width(5); cout << TestList[I-1];
    }
    exit(0);
  }

Here is a Pascal solution, for interest:

  PROGRAM Lucky;
  (* Simple test bed program for checking on the generation of lucky numbers
     P.D. Terry, Rhodes University, 2001 *)

    CONST
      Max = 1000;

    TYPE
      LISTS = ARRAY [1 .. Max] OF INTEGER;

    PROCEDURE GenerateLuckyNumbers (VAR LuckyList : LISTS; VAR Top : INTEGER);
    (* Generate a LuckyList of the lucky numbers between 1 and Top (inclusive *)
      VAR
        I, Step, Copied : INTEGER;

      BEGIN
        IF Top > Max THEN Top := Max;     (* check for potential disaster *)
        FOR I := 1 TO Top DO              (* set up initial list *)
          LuckyList[I] := I;
        Step := 2;
        WHILE Step <= Top DO BEGIN        (* sweep and copy down elements *)
          Copied := 0;
          FOR I := 1 TO Top DO BEGIN
            IF I MOD Step <> 0
              THEN                        (* only copy every Step-th element *)
                BEGIN
                  INC(Copied);            (* but count how many are copied *)
                  LuckyList[Copied] := LuckyList[I]
                END
          END;
          Top := Copied;                  (* new limit on the length of list *)
          INC(Step)
        END;
      END;

    VAR
      I, Limit : INTEGER;
      TestList : LISTS;

    BEGIN
      Write('Supply limit of numbers to be tested for luck ');
      Read(Limit);
      GenerateLuckyNumbers(TestList, Limit);
      FOR I := 1 TO Limit DO BEGIN
        Write(TestList[I]:5)
      END
    END.

  PROGRAM Luck1;
  (* Simple test bed program for checking on the generation of lucky numbers
     P.D. Terry, Rhodes University, 2001 *)

    CONST
      Max = 1000;

    FUNCTION GenerateLuckyNumbers (LuckyList[], Top);
    (* Generate a LuckyList of the lucky numbers between 1 and Top (inclusive *)
      VAR
        I, Step, Copied;

      BEGIN
        IF Top > Max THEN Top := Max;     (* check for potential disaster *)
        I := 1;
        WHILE I <= Top DO                 (* initial list *)
          BEGIN LuckyList[I] := I; I := I + 1 END;
        Step := 2;
        WHILE Step <= Top DO BEGIN        (* sweep and copy down elements *)
          Copied := 0;
          I := 1;
          WHILE I <= Top DO BEGIN
            IF I / Step * Step <> I THEN  (* I MOD Step = 0 *)
              BEGIN                       (* only copy every Step-th element *)
                Copied := Copied + 1;     (* but count how many are copied *)
                LuckyList[Copied] := LuckyList[I]
              END;
            I := I + 1;
          END;
          Top := Copied;                  (* new limit on the length of list *)
          Step := Step + 1
        END;
        RETURN Top;
      END;

    VAR
      I, Limit, TestList[1000];

    BEGIN
      Write('Supply limit of numbers to be tested for luck '); Read(Limit);
      Limit := GenerateLuckyNumbers(TestList, Limit);
      I := 1;
      WHILE I <= Limit DO BEGIN
        Write(TestList[I]); I := I + 1
      END
    END.

The sorting program clearly caused much confusion. I was delighted to see that most people knew where to look in their excellent second year course notes to find the algorithms. Unfortunately the version of the insertion sort you found did not translate easily, as it used an AND operator in the control of a WHILE loop, which the present version of Clang does not support. Sadly not everyone managed to find a clean (or indeed, any) solution to this problem. Consequently, various versions of the insertion sort are given below, as well as two versions of the bubble sort.

The Insertion sort can be made very simple if you work "downwards" rather than "upwards", and use List[0] as a sentinel (see Insert1Sort below). This allows one to avoid the potential AND operation altogether - and still to stay safely within the bounds of the array at all times. Of course, this relies on using array elements List[1] ... List[N] to store the N elements of the list rather than List[0] ... List[N-1], but that is a small price to pay, and, perhaps, makes all the algorithms simpler in any case.

A feature of many programs submitted was a cavalier approach to declaring far too many variables "global". Try to declare variables as locally as possible.

Clearly several people have little idea of how and why one must use semaphores to protect critical resources (like the overall counter, and, indeed, access to the printer).

Finally, the level of commenting in most of your code leaves a lot to be desired. One can overcomment simple code, but there should always be some sort of indication for each routine of what it is intended to do, expressed in terms of its parameters. And code should always have your names buried in it at the top.

  PROGRAM CompareSorts;
  (* Compare speed of sorting by using several sorts in parallel;
     also count number of array element assignments performed
     P.D. Terry, Rhodes University, 2001 *)

    VAR
      Assignments,
      Mutex, Printer (* Semaphore *);

    PROCEDURE Increment(x);
    (* Protected increment of global variable Assignments *)
      BEGIN
        WAIT(Mutex);
        Assignments := Assignments + x;
        SIGNAL(Mutex)
      END;

    PROCEDURE WriteList(List[], N);
    (* Display N elements in List[1] ... List[N] *)
      VAR
        I;
      BEGIN
        I := 1;
        WHILE I <= N DO BEGIN
          Write(List[I]); I := I + 1
        END;
      END;

    PROCEDURE BubbleSort (List [], N);
    (* Sort List[1] ... List[N] into ascending order *)
      VAR
        I, J, T;
      BEGIN
        I := 1;
        WHILE I <= N-1 DO BEGIN
          J := I + 1;
          WHILE J <= N DO BEGIN
            IF List[I] > List[J] THEN BEGIN
              T := List[I]; List[I] := List[J]; List[J] := T;
              Increment(3);
            END;
            J := J + 1
          END;
          I := I + 1
        END;
        WAIT(Printer);
        Write('BubbleSort completed');
        SIGNAL(Printer);
      END;

(* The "better" bubblesort achieves much better performance in some cases - in
   particular when the data is highly ordered to begin with.  It attempts to
   cut down the effective length of list to scan as much as possible on each
   pass *)

    PROCEDURE BetterBubbleSort (List [], N);
    (* Sort List[1] ... List[N] into ascending order *)
      VAR
        I, Length, LastChange, T;
      BEGIN
        Length := N;
        WHILE Length > 1 DO BEGIN
          LastChange := 1 (* optimistic *);
          I := 1;
          WHILE I <= Length - 1 DO BEGIN
            IF List[I] > List[I+1] THEN BEGIN
              T := List[I]; List[I] := List[I+1]; List[I+1] := T;
              Increment(3);
              LastChange := I
            END;
            I := I + 1;
          END;
          Length := LastChange
        END;
        WAIT(Printer);
        Write('BetterBubbleSort completed');
        SIGNAL(Printer);
      END;

(* The following variation on the insertion algorithm uses List[0] very
   effectively as a sentinel, making the WHILE loop condition as simple as
   possible *)

    PROCEDURE Insert1Sort (List [], N);
    (* Sort List[1] ... List[N] into ascending order *)
      VAR
        I, J;
      BEGIN
        I := 2;
        WHILE I <= N DO BEGIN
          List[0] := List[I]; Increment(1);
          J := I - 1;
          WHILE List[0] < List[J] DO BEGIN
            List[J+1] := List[J]; Increment(1);
            J := J - 1;
          END;
          List[J+1] := List[0]; Increment(1);
          I := I + 1
        END;
        WAIT(Printer);
        Write('Insert1Sort completed');
        SIGNAL(Printer);
      END;

(* The next variation demonstrates a classic "state variable" approach to
   handling the WHILE loop *)

    PROCEDURE Insert2Sort (List [], N);
    (* Sort List[1] ... List[N] into ascending order *)
      CONST
        TRUE = 1;
        FALSE = 0;
      VAR
        I, J, Temp, Searching;
      BEGIN
        I := 2;
        WHILE I <= N DO BEGIN
          Temp := List[I]; Increment(1);
          J := I - 1;
          Searching := TRUE;
          WHILE Searching = TRUE DO BEGIN
            IF Temp >= List[J] THEN Searching := FALSE;
            IF Temp < List[J] THEN BEGIN
              List[J+1] := List[J]; Increment(1);
              J := J - 1;
              IF J = 0 THEN Searching := FALSE;
            END
          END;
          List[J+1] := TEMP; Increment(1);
          I := I + 1
        END;
        WAIT(Printer);
        Write('Insert2Sort completed');
        SIGNAL(Printer);
      END;

(* The next alternative tries to overcome the lack of an AND operator by
   using a (nested) BOOLEAN function which can produce the correct "short
   circuit" effect by clever use of the RETURN statement *)

    PROCEDURE Insert3Sort (List [], N);
    (* Sort List[1] ... List[N] into ascending order *)
      CONST
        TRUE = 1;
        FALSE = 0;

      FUNCTION MustShift (J, Temp);
        BEGIN
          IF J = 0 THEN RETURN FALSE;
          IF Temp > List[J] THEN RETURN FALSE;
          RETURN TRUE
        END;

      VAR
        I, J, Temp, Searching;
      BEGIN
        I := 2;
        WHILE I <= N DO BEGIN
          Temp := List[I]; Increment(1);
          J := I - 1;
          WHILE MustShift(J, Temp) = TRUE DO BEGIN
            List[J+1] := List[J]; Increment(1);
            J := J - 1;
          END;
          List[J+1] := TEMP; Increment(1);
          I := I + 1
        END;
        WAIT(Printer);
        Write('Insert3Sort completed');
        SIGNAL(Printer);
      END;

    FUNCTION EqualList (L1[], L2[], N);
    (* Return TRUE iff L1[I] = L2[I] for all I in 1 ... N *)
      CONST
        FALSE = 0;
        TRUE = 1;
      VAR
        I, Equal;
      BEGIN
        Equal := TRUE;
        I := 1;
        WHILE I <= N DO BEGIN
          IF L1[I] <> L2[I] THEN Equal := FALSE;
          I := I + 1;
        END;
        RETURN Equal
      END;

    FUNCTION InOrder (L1[], N);
    (* Return TRUE iff L1[I] < L1[I+1] for all I in 1 ... N-1 *)
      CONST
        FALSE = 0;
        TRUE = 1;
      VAR
        I, Okay;
      BEGIN
        I := 1;
        WHILE I < N DO BEGIN
          IF L1[I] > L1[I+1] THEN RETURN FALSE;
          I := I + 1;
        END;
        RETURN TRUE
      END;

(* Note how we declare the rest of the global variables as close as possible to
   the point where they are "used" *)

    VAR
      List1[500], List2[500], List3[500], List4[500], List5[500], N, Data;

    BEGIN
      Printer := 1; Mutex := 1; Assignments := 0;
      N := 0; Read(Data);
      WHILE Data <> 0 DO BEGIN
        N := N + 1;
        IF N > 500 THEN BEGIN
          Write('Too much data'); RETURN
        END;
        List1[N] := Data; List2[N] := Data;
        List3[N] := Data; List4[N] := Data;
        List5[N] := Data;
        Read(Data)
      END;

(* we must supply each routine with its own copy of the list, as arrays are
   passed by reference *)

      COBEGIN
        Insert1Sort(List1, N);
        Insert2Sort(List2, N);
        Insert3Sort(List3, N);
        BubbleSort(List4, N);
        BetterBubbleSort(List5, N)
      COEND;
      Write(Assignments, ' assignments');
      Write('EqualList = ', EqualList(List1, List2, N),
            'Inorder = ', InOrder(List1, N));
    END.