Computer Science 301 - 2002

Programming Language Translation

Practical for Week 7, beginning 25 March 2002 - Solutions

There was some innovative work handed in, but there were, inevitably, several cases where it appears that people had missed several important points. You can find source versions of the program solutions in the solution kit PRAC07A.ZIP on the server. They are not as heavily commented as the ones below, where I have tried to draw attention to some errors and pitfalls that seemed to come up repeatedly.

Some general comments:

(a) You should ALWAYS put your names and a brief description of the program into the source code.

(b) Several submissions had no commentary at all, and this is just unacceptable.

(c) Practical exercises are usually set with very specific objectives in mind. In this case two major objectives were that you should learn a bit about the stdio library (which I hope almost everyone did) and also about using the concept and implementation of a "set". One group went off and completely "did their own thing", which means that they might have lost out on what they should have learned, and got into deep water that I had hoped they would avoid. Please don't do this.

Task 2

The discussion of executable sizes was not carried right through by everyone, but most students seemed to get the idea. For the record, here are the sizes I obtained (exact sizes may differ from system to system):

Executable sizes derived from sources shown:

                   Using a "set"                      Using an "array"
                                    EXE   OBJ                          EXE   OBJ
  Turbo Pascal:    SIEVE.PAS       3504               SIEVE1.PAS      2960
  Free  Pascal:    SIEVE.PAS      14848               SIEVE1.PAS     14848

  JPI Modula-2     SIEVE.MOD      21275  2018         SIEVE1.MOD     21214  1955

  Using cin and cout:

  Borland C 3.1:   SIEVE.CPP      20652  1904         SIEVE1.CPP     20492  1607
  Borland C 5.5:   SIEVE.CPP     149504  4647         SIEVE1.CPP    149504  4360
  Microsoft C:     SIEVE.CPP      40960  2901         SIEVE1.CPP     40960  1734

  Using scanf and printf:

  Borland C 3.1:   SIEVE.CPP      12330  1649         SIEVE1.CPP     12154  1337
  Borland C 5.5:   SIEVE.CPP      66560  1823         SIEVE1.CPP     66048  1544
  Microsoft C:     SIEVE.CPP      32768  2647         SIEVE1.CPP     32768  1407

  Using scanf and printf after translation from Modula-2 by X2C

  Borland C 3.1:   SIEVE.C        34572  1476         SIEVE1.C       34572  1433
  Borland C 5.5:   SIEVE.C        62976  1603         SIEVE1.C       62976  1596
  Microsoft C:     SIEVE.C        45056  2297         SIEVE1.C       45056  2304

It is quite clear that the C++ compilers, certainly when using the iostream library, produce enormous code bloat. I am pretty sure that these differences are due mostly to the extent to which I/O library implementation is layered. To that end, I experimented further. Null programs like these:

    PROGRAM p;      MODULE p;     void main(void) {
    BEGIN           END p.        }
    END.

generated executable sizes of 1472 (Turbo Pascal), 12800 (Free Pascal), 1773 (Modula-2), 4204 (Borland 3.1 C++) and 47104 (Borland 5.5 C++).

I performed a further experiment, stripping out all the I/O statements from the "array" version of the sieve programs. Executable sizes were 1680 (Turbo Pascal), 13312 (Free Pascal), 1932 (Modula-2 with array range checks), 1874 (Modula-2 with no array range checks), 6378 (Borland 3.1 C++) and 47104 (Borland 5.5 C++). Clearly the 16 bit Pascal and Modula-2 compilers are much tighter, and perform very well (only adding a few hundred bytes to the "null" program base sizes).

The Borland 5.5 and Microsoft C++ compilers are 32-bit ones, rather than 16-bit ones. But even allowing for this, they suffer from bizarre code bloat for small applications. There may be command line parameters and options that one can set to try to produce tighter code, but I have not bothered to experiment further. Some (most) of the overheads may relate to the fact that they have to produces "Windows" compatible programs. Compilers like this are clearly designed with an "everyone has 128MB of memory and 60GB of disk space, and if they don't they should go and buy more" philosophy.

Similarly, the discussion of "limits on the size of the sieve" was not taken right through by many people. Several discovered that the Pascal "set" size is limited to 256 elements (which is very restrictive for this conceptually very useful data structure). The JPI Modula-2 compiler allows very large sets, because, like the set.h implementation, it represents sets by packing 8 elements to a byte. The Pascal, Modula and (I suspect) Borland 3.1 compilers had to work within an architectural limitation of earlier processors running under DOS that no data structure could exceed 64K bytes in size. While this gave the illusion that one could set up "array" based sieves of very nearly this size, in practice there is another nasty limitation, which some of you discovered and others probably missed. In 16 bit compilers the range of an int or INTEGER was limited to -32768 .. 32768 and an unsigned int (or CARDINAL in Modula-2 terminology) to 0 .. 65355. Attempts to drive the sieve larger are handicapped by this. In the case of the Pascal and Modula programs one soon gets "array index out of range" errors. In the case of the Borland 3.1 C++ system one gets what one deserves for driving without brakes - rubbish (at best) and a total collapse (at worst). On the other hand the 32 bit compilers allow very much bigger sieves as the int type has a range of about 4 billion, and the operating system and large memories in the computers support larger data structures. With a memory size of about 64M and an int size of 4 bytes I expect the system would handle arrays of up to about 8MB with no problem, and several students reported this (I am not brave or perhaps foolish enough to try pushing my own system to its limits using the inferior protection provided by C++ and Windows, sorry. I once lost a hard disk because a pointer went out of range unchecked. Not a nice situation to be in!)

On the other hand, the limitation imposed by the Clang system is entirely due to the fact that the interpreter system only allows for 8000 words of memory -which has to hold the pseudo-code and all variables. The limit on the sieve size was about 7900 I think. This could have been extended simply by modifiying the interpreter, and then recompiling it, but you have not seen how to do that (yet).

Task 3

Code follows for one of the Sieve programs in C++ using scanf/printf.

Note that the program includes an exit(1) call to abort itself if it discovers that the input data is out of range. Develop the habit of doing this sort of thing and making these sorts of checks. C compilers don't (can't) generate code to apply array bound checks, so if you write array handling programs that go out of bounds you have only yourself to blame when spectacular disasters occur as soon as your buggy programs modify memory in odd places!

  // Sieve of Eratosthenes for finding primes 2 <= n <= 4000
  // P.D. Terry,  Rhodes University, 2002

  #include "misc.h"

  const int Max = 4000;
  typedef boolean SIEVES[Max - 1];

  void main(void)
  { int i, n, k; // counters
    long primes = 0;
    SIEVES Uncrossed;   // the sieve
    printf("Supply largest number to be tested ");     // -------
    scanf("%d", &n);                                   // ------- note the &
    if (n > Max) {
      printf("n too large, sorry");                    // -------
      exit(1);                                         // abandon program
    }
    printf("Prime numbers between 2 and %d\n", n);     // -------
    printf("-----------------------------------\n");   // -------
    for (i = 2; i <= n; i++)
      Uncrossed[i-2] = true;    // clear sieve
    for (i = 2; i <= n; i++) {  // the passes over the sieve
      if (Uncrossed[i-2]) {
        if (primes % 10 == 0) putchar('\n');           // -------
        primes++;
        printf("%5d", i);                              // ------- note the 5
        // now cross out multiples of i
        k = i;
        do {
          Uncrossed[k-2] = false;
          k += i;
        } while (k <= n);
      }
    }
    exit(0);
  }

Task 4 - The Sieve in Clang

Most people successfully corrected the silly errors in SIEVE.CLN. The corrected source can be found in the prac solution kit.

Task 5 - The Sieve in Parva

Note again how the program stops if too large a value of N is supplied - many people did not do this correctly.

  void main() {
  // Sieve of Eratosthenes for finding primes 2 <= n <= 4000 (Parva version)
  // P.D. Terry,  Rhodes University, 2002
    const Max = 4000;
    boolean Uncrossed[Max];                   // the sieve
    int i, n, k;                              // counters
    int primes = 0;
    read("Supply largest number to be tested ", n);
    if (n > Max) {
      print("n too large, sorry");
      exit(1);                                // abandon program
    }
    println("Prime numbers between 2 and " , n);
    println("-----------------------------------");
    for (i = 2; i <= n; i++) {                // clear sieve
      Uncrossed[i-2] = true;
    }
    for (i = 2; i <= n; i++) {                // the passes over the sieve
      if (Uncrossed[i-2]) {
        if (primes % 8 == 0) { print("\n"); } // ensure line not too long
        primes++;
        print(i:8);
        k = i;                                // now cross out multiples of i
        do {
          Uncrossed[k-2] = false;
          k += i;
        } while (k <= n);
      }
    }
    exit(0);
  }

Task 6 - The Sieve in Java

  import java.io.*;

  class sieve {
  // Sieve of Eratosthenes for finding primes 2 <= n <= 4000 (Java version)
  // Uses Keyboard routines
  // P.D. Terry,  Rhodes University, 2002

    public static void main (String [] args) {
      final int Max = 4000;
      boolean [] Uncrossed = new boolean [Max];  // the sieve
      int i, n, k;                               // counters
      int primes = 0;
      Keyboard.prompt("Supply largest number to be tested ");
      n = Keyboard.readInt();
      if (n > Max) {
        System.out.print("n too large, sorry");
        System.exit(1);                          // abandon program
      }
      System.out.println("Prime numbers between 2 and " + n);
      System.out.println("-----------------------------------");
      for (i = 2; i <= n; i++) {                 // clear sieve
        Uncrossed[i-2] = true;
      }
      for (i = 2; i <= n; i++) {                 // the passes over the sieve
        if (Uncrossed[i-2]) {
          if (primes % 8 == 0) {                 // ensure line not too long
            System.out.println();
          }
          primes++;
          System.out.print("\t" + i);
          k = i;                                 // now cross out multiples of i
          do {
            Uncrossed[k-2] = false;
            k += i;
          } while (k <= n);
        }
      }
      System.exit(0);
    }
  }

Task 7 - High level translators

Several students complained that the C code generated by the X2C system was "unreadable", and "not what we would have written ourselves". There can be no dispute with the second of those arguments, but if you take a careful look at the generated C code, it is verbose, rather than unreadable, because long identifier names have been used. This is actually not such a bad thing - at least one can tell the "origin" of any identifier, since the original module in which it was declared is incorporated into its name, and is a very useful style, both for large programs, and (more especially) when one has to contend with the miserable rules that C has for controlling identifier name spaces. In fact, in the days when I used Modula regularly, I used a convention similar to this of my own accord, and have carried it over into my C coding, as you will see from the source in the appendices to the book. Some of the other "unreadability" presumably relates to the fact that the X2C system is obliged to translate the CARDINAL type to the unsigned int type, which is not one many of you will ever have used - this explains all those funny casts and capital U's that you saw. Some people commented that 'maintaining the C code generated would be a nightmare". Well, maybe, but the point of using a tool like this is that you can develop and maintain your programs in Modula-2 and then simply convert them to C when you want to get them compiled on some other machine.

Task 8 Creative programming - The N Queens problem

I hope everyone managed to work out how to count the solutions! Many people managed to convert it to Clang too - it looks very similar to Pascal of course, the big difference being that one has to arrange the array subscripts for array C to span a non-negative range. A depressing number of submissions had not realised this, and could not possibly have worked.

  PROGRAM Queens;
  (* Find all solutions to the N Queens problem - how do we place N queens on
     an N*N chessboard so that none of them is threatened by any other one.
     Code closely based on that by Niklaus Wirth in "Algorithms + Data
     Structures = Programs" (Prentice-Hall, 1976; page  145), but generalised
     to allow N queens, rather than only 8
     Modifications by Pat Terry, Rhodes University, 2002 *)

    CONST
      Max = 24 (* Maximum board size *);

    (* Introduce names for things wherever you can *)

      FALSE = 0;
      TRUE = 1;

    (* I have left these rather silly names untouched.  Wirth has a habit of
       choosing names like most students do - so that nobody has a clue what
       they represent and curses the author for evermore! *)
    VAR
      A[24], B[48], C[48], X[24], I, N;

    PROCEDURE DisplaySolution;
    (* Display one solution *)
      VAR
        I;
      BEGIN
        I := 1;
        WHILE I <= N DO BEGIN Write(X[I]); I := I + 1 END;
        Write;
      END;

    PROCEDURE Try (I, N);
    (* Place the I-th queen on the board of size N * N *)
      VAR
        J;
      BEGIN
        J := 1;
        WHILE J <= N DO BEGIN

    (* This "sequential conjunction" evaluation is easy and safe enough here *)

          IF A[J] * B[I+J] * C[I-J+N] = 1 THEN BEGIN
            X[I] := J;
            A[J] := FALSE; B[I+J] := FALSE;

    (* Note the way in which we index array C to make sure the subscripts are non-negative *)

            C[I-J+N] := FALSE;
            IF I < N THEN Try(I+1, N);
            IF I >= N THEN DisplaySolution;
            A[J] := TRUE; B[I+J] := TRUE; C[I-J+N] := TRUE
          END;
          J := J + 1;
        END
      END;

    BEGIN
      Read(N);

    (* We can use RETURN in the main program to EXIT if the program detects a
       gross error in the data *)

      IF N > Max THEN BEGIN Write('N too large'); RETURN END;
      I := 1; WHILE I <= N DO BEGIN A[I] := TRUE; I := I + 1 END;
      I := 1; WHILE I <= 2*N DO BEGIN B[I] := TRUE; C[I] := TRUE; I := I + 1 END;
      Try(1, N);
    END.

Here is a Parva version, modified to count the solutions, and where, for timing purposes (see later) the output statements have been commented out. Note that since in Parva one cannot pass simple variable parameters by reference, the count of the number of solutions has been held in an array of length 1 (which can be passed by reference) . Notice also that in Parva, Pascal, Clang, Modula and C++ you have to obey a "declare before use" rule, which is why the functions below appear in the order they do. Alternatively one has to use "function prototypes" to achieve the correct effect.

  void DisplaySolution (int [] x, int n) {
  // Display one solution
  /*
    for (int i = 1; i <= n; i++) { print(x[i]); }
    println();
  */
  }

  void try (int i, int n, boolean [] a, boolean [] b, boolean [] c,
            int [] x, int [] sol) {
  // Place the i-th queen on the board of size n * n
    int j = 1;
    while (j <= n)  {
      if (a[j] && b[i+j] && c[i-j+n])  {
        x[i] = j;
        a[j] = false; b[i+j] = false; c[i-j+n] = false;
        if (i < n) { try(i+1, n, a, b, c, x, sol); }
        else { sol[0]++; DisplaySolution(x, n); }
        a[j] = true; b[i+j] = true; c[i-j+n] = true;
      }
      j++;
    }
  }

  void main () {
  // Find all solutions to the n Queens problem - how do we place n queens on
  // an n*n chessboard so that none of them is threatened by any other one.
  // Code closely based on that by Niklaus Wirth in "Algorithms + Data
  // Structures = Programs" (Prentice-Hall, 1976; page 145), but generalised
  // to allow n queens, rather than only 8

  // Count solutions and iterate the process a requested number of times

  // Modifications by Pat Terry, Rhodes University, 2002

    const Max = 24; // Maximum board size
    boolean [] a = new boolean [24];
    boolean [] b = new boolean [48];
    boolean [] c = new boolean [48];
    int n, t, iterations;
    int [] x = new int [24];
    int [] sol = new int [1];

    read("Board size? ", n);
    read("Iterations? ", iterations);
    if (n > Max) { print("n too large"); return; }
    for (int count = 0; count < iterations; count++) {
      sol[0] = 0;
      for (int i = 1; i <= n; i++)  { a[i] = true; }
      for (int i = 1; i <= 2*n; i++)  { b[i] = true; c[i] = true; }
      try(1, n, a, b, c, x, sol );
    }
    print("Board size ", n, " Solutions ", sol[0], " Iterations ", iterations);
}

In Java of course we can make use of a global variable to count the number of solutions. Once again this solution has commented out the output statements for timing purposes. Note that now we have been able to go back to using "global variables" and to defining the array sizes in terms of Max to allow for easy editing in future. (A version closer to the Parva one, with the arrays passed as parameters, can be found in the solution kit).

  import java.io.*;

  class queensa {

    private static int solutions;
    static final int Max = 24; // Maximum board size
    static boolean [] a = new boolean [Max];
    static boolean [] b = new boolean [2 * Max];
    static boolean [] c = new boolean [2 * Max];
    static int [] x = new int [Max];

    private static void DisplaySolution () {
    // Display one solution
    /*
      for (int i = 1; i <= n; i++) { System.out.print(" " + x[i]); }
      System.out.println();
    */
    }

    public static void tryit (int i, int n) {
    // Place the i-th queen on the board of size n * n
      int j = 1;
      while (j <= n)  {
        if (a[j] && b[i+j] && c[i-j+n])  {
          x[i] = j;
          a[j] = false; b[i+j] = false; c[i-j+n] = false;
          if (i < n) tryit(i+1, n);
          else { solutions++; DisplaySolution(); }
          a[j] = true; b[i+j] = true; c[i-j+n] = true;
        }
        j++;
      }
    }

    public static void main (String[] args) {
    // Find all solutions to the n Queens problem - how do we place n queens on
    // an n*n chessboard so that none of them is threatened by any other one.
    // Code closely based on that by Niklaus Wirth in "Algorithms + Data
    // Structures = Programs" (Prentice-Hall, 1976; page 145), but generalised
    // to allow n queens, rather than only 8

    // Count solutions and iterate the process a requested number of times

    // Modifications by Pat Terry, Rhodes University, 2002

      Keyboard.prompt("Board size? ");
      int n = Keyboard.readInt();
      Keyboard.prompt("Iterations? ");
      int iterations = Keyboard.readInt();
      if (n > Max) { System.out.print("n too large"); return; }
      for (int count = 0; count < iterations; count++) {
        solutions = 0;
        for (int i = 1; i <= n; i++)  { a[i] = true; }
        for (int i = 1; i <= 2*n; i++)  { b[i] = true; c[i] = true; }
        tryit(1, n);
      }
      System.out.println("Board size " + n + " Solutions " + solutions + " Iterations " + iterations);
    }
  }

How many solutions are there to the N Queens problem? Do you see a trend?

        2   3    4    5    6    7    8    9     10     11     12      13
     .---------------------------------------------------------------------.
     |  0 |  0 |  2 | 10 |  4 | 40 | 92 | 352 | 724 | 2680 | 14200 | 73712 |
     `---------------------------------------------------------------------'

Clearly the number of solutions is starting to rise very rapidly. If you were very keen you might like to try to estimate the "Order" behaviour from these figures (how?).

To measure the differences in speed between the Clang and Pascal versions of the N Queens program one can try timing the execution. On some old machines at my disposal, and using a stop watch, I found that for small values of N the programs ran so fast that measuring the time was difficult. But for larger N, I obtained

   N = 11 133MHz Pentium  Pascal  1.6 seconds   Clang 100 seconds  Ratio 64:1
   N = 12 133MHz Pentium  Pascal  7.4 seconds   Clang 540 seconds  Ratio 72:1
   N = 12 266MHz Pentium  Pascal  3.7 seconds   Clang 279 seconds  Ratio 75:1

This sort of behaviour - 10 to 100 times slower - is typical of interpretive systems. I could probably speed up the Clang interpreter considerably, of course.

Trying to time things on the faster machines in the lab was problematic. The solutions above (and equivalent ones in Pascal and Clang) were hacked so as to allow the problem to be repeatedly solved. I obtained the following timings for 6000 iterations of the 8 queens problem, and for 8 iterations of finding prime numbers up to 5000:

                                                      Queens    Sieve
  Turbo Pascal (16 bit) (with array bound checks)       14.09    19.53 secs
  Turbo Pascal (16 bit) (no array bound checks)          2.51     3.69 secs
  Free Pascal (32 bit) (with array bound checks)         4.11     4.48 secs
  Free Pascal (32 bit) (no array bound checks)           2.40     2.04 secs
  Clang (Modula-2 implementation; 16 bit)              170.00   454.00 secs
  Parva (Free Pascal 32 bit implementation)            303.00   360.00 secs
  Java (JIT mode)                                        2.12     3.14 secs
  Java -Xint mode (strictly interpreted)                20.78    21.06 secs
  Borland C++  5.5  (32 bit)                             1.19          secs
  Borland C++  3.1  (16 bit)                             4.37          secs
  Microsoft C++ (32 bit)                                 2.37          secs

Clearly the 32 bit compilers do far better than the 8 bit ones, and the rather unoptimised Clang and Parva interpreter systems are far slower than the native code systems. It is interesting to see how fast the Java system runs when in the JIT (Just in time) default mode where the byte codes produced by the compiler are converted to native code as the program starts to run.