Prac 19 solutions

Computer Science 301 - 2004

Programming Language Translation

Practical for Week 19, beginning 30 August 2004 - Solutions

The submissions received were very varied, and on the whole of rather disappointing quality. I gained the impression that several groups had left things far too late, or had simply not bothered. On the other hand, there was some innovative work handed in, but even here there was evidence that people had missed several important points. You can find complete source versions of the program solutions in the solution kit PRAC19A.ZIP on the server. This file also contains C# versions of the solutions for people who might be interested, and a number of variations on the solutions.

Some general comments:

(a) You should always put your names and a brief description of the program into the source code.

(b) Several submissions had almost no commentary at all, and this is just unacceptable. In particular, supply commentary at the start of each method as to what it sets out to do, and on the significance of the parameters/arguments.

(c) Several submissions in this prac suggested that you have all learned to comment out code stolen from elsewhere, rather than deleting it properly when it has no bearing on the task in hand. That is really very sloppy, and makes for lots of confusion on the part of people trying to work out what you are really trying to do.

(d) The pracs in this course are deliberately set to extend you, in the hope that you will learn a lot from each one. Their completion requires that you apply yourself steadily throughout the week, and not just on Thursday afternoon and the following Thursday morning!

(e) A large number of submissions were received that had not made use of the InFile and OutFile classes as you had been told to do. These library classes are designed to make text based I/O as simple as possible without having to deal with buffered readers, exceptions, string tokenizers and all that stuff that you were probably subjected to in CSC 102, but without realising that the best thing to do with bizarre code is to hide its details in a well-designed library. Have a look at the solutions below, where hopefully you will see that the I/O handling has been made very simple indeed.

Tasks 2 to 7 - The Sieve of Eratosthenes

The first tasks were fairly straightforward. Because the Pascal compiler only uses 16-bit arithmetic for INTEGER type it may appear to allow sieve sizes of about 65000. However, since it regards an integer as a signed 16-bit number, an extra limitation is imposed - an array indexed by an integer cannot have an upper subscript greater than 32767. It's even worse than that, as can be seen by considering code like:

            K := I (* now cross out multiples of I *);
            REPEAT
              Uncrossed[K] := FALSE; K := K + I
            UNTIL (K > N);

When I becomes large enough, K+I really should become larger than 32767, but the overflow means that it appears to go negative (think back to your CSC 201 course). This happens for the first time after detecting the prime number 16411, so that the maximum effective sieve with the code above is really only 16410.

We can extend the range of the algorithm by a trick which I did not expect you to discover, but which may be worth pointing out. Replace the above code by

            K := I (* now cross out multiples of I *);
            REPEAT
              Uncrossed[K] := FALSE; K := K + I
            UNTIL (K > N) OR (K < I)

which looks ridiculous, but when K gets too large and then overflows, since it appears to become negative it will then also appear to be less than I.

All rather subtle - up till now you probably have not really written or run serious code that falls foul of overflow or rounding errors, but they can be very awkward in serious applications.

There were several specious reasons thought up to explain why the executables were of such differing sizes. It is not true that this is a function of the sieve size to any marked degree, as the code and data areas are handled separately. The real reasons are that the efficiency of code generation differs markedly from one compiler to another, and that some implementations build in a great deal more extraneous code than others - you could see this in the smaller executable when some compilers were run in "optimizing" mode. The C and C++ executables differ enormously in size - no doubt due to the vast amounts of code needed to support the iostream library.

The Borland 5.5 compiler is a 32-bit one, rather than 16-bit. But even allowing for this, it suffers from bizarre code bloat for small applications. There may be command line parameters and options that one can set to try to produce tighter code, but I have not bothered to experiment further. Some (most) of the overheads may relate to the fact that it has to produces "Windows" compatible programs. Compilers like this are clearly designed with an "everyone has 256MB of memory and 60GB of disk space, and if they don't they should go and buy more" philosophy.

The "executable" produced from the C# compiler are not true executables in the same sense as those produced by the C++ and Pascal compilers, as the code in them still has to be "Jitted" into its final form.

The limitation imposed by the Parva system on the sieve size is entirely due to the fact that the interpreter system only allows for about 50000 words of memory - which has to hold the pseudo-code and all variables. The limit on the sieve size was about 49000. This could have been extended simply by modifying the interpreter, and then recompiling it, but you were not in a position to do that.

  .------------------------------------------------------------------------.
  | TASK 2          | Sieve         | Fibo         |                       |
  |                 | Code Size     | Code Size    |                       |
  |                 |               |              |                       |
  | Free Pascal     |   15 872      |    14 336    | Sieve limit  16410    |
  |-----------------+---------------+--------------+-----------------------|
  | Optimized       |               |              |                       |
  | Free Pascal     |   14 848      |    14 336    | Sieve limit  16410    |
  |-----------------+---------------+--------------+-----------------------'
  |                 |               |              |
  | Borland C 5.5   |   66 560      |    66 048    |
  |-----------------+---------------+--------------|
  |                 |               |              |
  | Borland C++ 5.5 |  149 504      |   148 480    |
  |-----------------+---------------+--------------|
  |                 |               |              |
  | C#              |   28 672      |    28 672    |
  |-----------------+---------------+--------------+-----------------------.
  |                 |               |              |                       |
  | Parva           |   N/A         |       N/A    | Sieve limit  49000    |
  |-----------------+---------------+--------------+-----------------------'
  | Modula-2 via    |               |              |
  | Borland C 5.5   |   62 976      |    62 464    |
  `------------------------------------------------'

The Sieve in Parva

The corrected code is very simple:

  void main() {
  // Sieve of Eratosthenes for finding primes 2 <= n <= 49000 (Parva version)
  // P.D. Terry,  Rhodes University, 2004
    const Max = 49000;
    bool[] Uncrossed = new bool[Max];         // the sieve
    int i, n, k, it, iterations, primes = 0;  // counters
    read("How many iterations? ", iterations);
    read("Supply largest number to be tested ", n);
    if (n > Max) {
      write("n too large, sorry");
      return;
    }
    it = 1;
    while (it <= iterations) {
      primes = 0;
      write("Prime numbers between 2 and " , n, "\n");
      write("-----------------------------------\n");
      i = 2;
      while (i <= n) {                          // clear sieve
        Uncrossed[i-2] = true;
        i = i + 1;
      }
      i = 2;
      while (i <= n) {                          // the passes over the sieve
        if (Uncrossed[i-2]) {
          if (primes - (primes/8)*8 == 0)
            write("\n");                        // ensure line not too long
          primes = primes + 1;
          write(i, "\t");
          k = i;                                // now cross out multiples of i
          Uncrossed[k-2] = false;
          k = k + i;
          while (k <= n) {
            Uncrossed[k-2] = false;
            k = k + i;
          }
        }
        i = i + 1;
      }
      it = it + 1;
      write("\n");
    }
    write(primes, " primes");
  }

Task 7 - High level translators

Several students complained that the C code generated by the X2C system was "unreadable", and "not what we would have written ourselves". There can be no dispute with the second of those arguments, but if you take a careful look at the generated C code, it is verbose, rather than unreadable, because long identifier names have been used. This is actually not such a bad thing - at least one can tell the "origin" of any identifier, since the original module in which it was declared is incorporated into its name, and this is is a very useful trick, both for large programs, and (more especially) when one has to contend with the miserable rules that C has for controlling identifier name spaces. In fact, in the days when I used Modula regularly, I used a convention similar to this of my own accord, and have often carried it over into my other coding. Some of the other "unreadability" presumably relates to the fact that the X2C system is obliged to translate the CARDINAL type to the unsigned int type, which is not one some of you will ever have used - this explains all those funny casts and capital Us that you saw. Some people commented that "maintaining the C code generated would be a nightmare". Well, maybe, but the point of using a tool like this is that you can develop and maintain your programs in Modula-2 and then simply convert them to C when you want to get them compiled on some other machine.

Task 8 - How fast/slow are various implementations?

The times (seconds) taken to execute the various programs are shown below (figures in brackets are for runs on my 1.04GHz laptop, others for the 2.41 GHz lab machines). In all cases the systems ran on Windows XP.

  .--------------------------------------------------------------.
  | Sieve: Iterations  10000       | Size of sieve  16000        |
  |--------------------------------------------------------------|
  | Fibo:  Upper limit   40  |  Sieve          |   Fibo          |
  |--------------------------+-----------------+-----------------|
  | Free Pascal              | (14.54)  8.80   |  (11.6)  5.83   |
  |--------------------------+-----------------+-----------------|
  | Free Pascal (optimized)  | (8.67)   5.28   |  (11.6)  5.79   |
  |--------------------------+-----------------+-----------------|
  | C++                      | (4.05)   2.18   |  (12.3)  5.03   |
  |--------------------------+-----------------+-----------------|
  | Modula-2 (via C)         | (25.67)  8.55   |  (17.9)  5.12   |
  |--------------------------+-----------------+-----------------|
  | C#                       | (3.70)   2.34   |  (9.45)  4.70   |
  |--------------------------+-----------------+-----------------|
  | Java                     | (8.49)   2.97   |  (9.78)  5.77   |
  |--------------------------+-----------------+-----------------|
  | Parva                    | (1050)   530    |  (692)   345    |
  |--------------------------+-----------------+-----------------|
  | Parva (optimized)        | (760)    397    |          345    |
  `--------------------------------------------------------------'

We note several points of interest:

(a) The Parva interpreted system is about two orders of magnitude slower than the native-code systems. Even here we can see the benefits of using an optimizing system, simple though this is (later we shall discuss this in more detail).

(b) In deriving these figures some experimentation was first needed so as to find parameters that would yield times sufficiently long to make comparisons meaningful. The times were obtained by simple use of a stopwatch, and of course there is always an element of reaction time in such measurements. The output statements were commented out so that all that was really being measured was the time for the algorithms themselves.

(c) The C# system seems consistently better than the Java system. Both of these employ a JIT (just in time) phase in compilation, rather than being interpreted.

(d) Even allowing for the reaction time phenomenon, there are some strange anomalies here. One might expect the execution times to be very closely related to the processor clock speeds, and that the ratio of times measured on the laptop and the lab machines for each application would have been the same, but clearly they are not. I expect that the differences - which are quite marked - can be put down to the different interior architectures of the processors themselves, but I have not had time to explore this further.

Task 10 - Goldbach's Conjecture.

Most submissions had grasped the concept of setting up a set to contain the prime numbers, but there was sometimes very tortuous/confused code thereafter for the very simple task of trying to find whether an even number could be expressed as the sum of two of those primes. You only need one loop for each attempt - if N is to be expressed as the sum of two numbers A and B, then there is no need to try out all possible combinations of A and B (since B must = N - A). Additionally, we need only try values for A from 2 to N/2 (think about it!). One way of programming this exercise would be as below. Note the way in which the sieve has been constructed from a simple adaptation of the code given to you earlier. There is no need for the primes method to do any output, or even to count how many prime numbers are added to the set.

  import Library.*;
  import java.util.*;

  class Goldbach {
  // Sieve of Eratosthenes for testing Goldbach's conjecture
  // P.D. Terry,  Rhodes University, 2004

    public static void main(String [] args) {
      IO.write("Supply largest number to be tested ");
      int limit = IO.readInt();        // limit of test
      SymSet primeSet = primes(limit);
      boolean Goldbach = true;         // optimistic
      int test = 4;
      while (Goldbach && test <= limit) {
        boolean found = false;         // try to find the pair
        int i = 2;
        while (i <= test / 2 && !found) {
          if (primeSet.contains(i) && primeSet.contains(test - i)) {
            found = true;
            IO.writeLine(" " + test + "\t" + i + "\t" + (test - i));
          }
          else i++;
        }
        if (!found) {
          IO.writeLine("conjecture fails for ", test);
          Goldbach = false;
        }
        test = test + 2;               // move on to next even number
      }
      IO.writeLine("Conjecture seems to be " + Goldbach);   // final result of test
    }

    static SymSet primes(int max)  {
    // Returns the set of prime numbers smaller than max
      SymSet primeSet = new SymSet();  // the prime numbers
      SymSet crossed = new SymSet();   // the sieve
      for (int i = 2; i <= max; i++) { // the passes over the sieve
        if (!crossed.contains(i)) {
          primeSet.incl(i);
          int k = i;                   // now cross out multiples of i
          do {
            crossed.incl(k);
            k += i;
          } while (k <= max);
        }
      }
      return primeSet;
    }

  }

Task 11 - Happy snapshots

Submissions here varied from those that had not really even got started, through some very nice clean ones, to some that were almost unbelievably complicated and whose authors had sadly not sought after elegance and simplicity. There is always a simpler and more elegant solution. Believe me!

Here is my suggested solution. Besides being simpler than most submitted, it also does some error checking. Few of you thought to do this, producing solutions that might have worked if the users had treated them gently, but failing to react sensibly to silly requests to arrange 0 students, or not providing N names/heights after asking to arrange N students. I was intrigued that several people had thought of using a square root as the basis for choosing an "intelligent" layout. Perhaps the tutors had tipped them off, but if not, well done!

  // Determine optimum layout for class photograph
  // P.D. Terry, Rhodes University, 2004

  import java.util.*;
  import Library.*;

  class Student {

    // we need their names and heights, and to record whether selected
    public String name = "";
    public double height;
    public boolean selected = false;

    public Student(double h, String s) {
      this.height = h;
      this.name = s;
    }

  } // Student

  class Photo {

    static Student[] students;    // static fields for convenience
    static int size;

    static Student nextTallest() {
    // Returns the next tallest available student in the list and then sets the
    // selected flag to prevent this student from being chosen again
      int maxPos = 0;
      double maxHeight = 0.0;
      for (int i = 0; i < size; i++)
        if (!students[i].selected && students[i].height > maxHeight) {
          maxPos = i;
          maxHeight = students[maxPos].height;
        }
      students[maxPos].selected = true;
      return students[maxPos];
    }

    static void writeRow(OutFile results, int n) {
    // Sets up and then displays the next row of n students
    // a typical sequence for "next" would be    4  5  3  6  2  7  1  8  0
      Student[] row = new Student[n];
      int sign = 1, step = 1, next = (n + 1) / 2 - 1; // start in the middle
      for (int i = 1; i <= n; i++) {
        row[next] = nextTallest();                    // fill this position
        next = next + sign * step;                    // and prepare for the next position
        sign = - sign;                                // swap to other side of centre
        step++;                                       // and move out one position
      }
      for (int i = 0; i < n; i++) results.write(row[i].name, -8);
      results.writeLine();
      for (int i = 0; i < n; i++) results.write(row[i].height, 8);
      results.writeLine();
      results.writeLine();
    }

    public static void main(String[] args) {
    // first check that commandline arguments have been supplied
    // attempt to open data file
    // attempt to open results file
    // all this as in SampleIO.java - see full solution for details

    // attempt to read the data
      int maxSize;
      do {
        IO.writeLine("Supply class size (>= 1)");
        maxSize = IO.readInt();                       // there may not really be that many
      } while (maxSize <= 0);                         // handle stupid abuse
      students = new Student[maxSize];                // but prepare for that number
      size = 0;
      do {
        students[size] = new Student(data.readDouble(), data.readLine());
        if (data.noMoreData()) break;                 // data ran out!
        size++;
      } while (size < maxSize);
      if (size != maxSize)                            // report the miscalculation
        IO.writeLine("Class appears to have only " + size + " students");
      if (size == 0) {                                // another safety check
        IO.writeLine("Cannot arrange a class with no students!");
        System.exit(1);
      }

    // compute the number to be put into most rows
      int rows = (int) (0.5 * (Math.sqrt(size) + 1.0));
      int onRow = size / rows;                        // most rows will have this number

    // and get on with the arrangement proper
      int toBeSeated = size;                          // still to be placed
      while (toBeSeated >= 2 * onRow) {               // for all but the front row
        writeRow(results, onRow);                     // print out that row
        toBeSeated -= onRow;                          // and prepare for the next one
      }
      writeRow(results, toBeSeated);                  // front row may be a bit longer
      results.close();                                // close the file safely
      IO.writeLine("View the map created in " + args[1]);
    }

  } // Photo

There are a few further observations that can be made:

(a) There is no real need to sort the data before distributing it, although clearly this is a posible alternative approach. The point is that a sort into strictly ascending or descending order still only gets you part way to solving the zig-zag distribution problem.

(b) The algorithm above for distributing the data is O(n²). For small classes this would be quite acceptable. One could improve the efficiency a bit by not merely marking a student as selected, but by creating a dynamic structure and then actually eliminating a student once he or she had been selected. In this way the list of students still to be placed would get shorter and shorter on each pass. But I suspect the operation of elimination might add as much overhead as it was trying to save, unless the class were really quite big. You might like to think of how to implement this by using an ArrayList rather than a fixed array.

Task 12 - Pig Latin

A naive solution for this can be effected quite easily. The code below does this for data read word by word - rather than line by line - from a data file (no need to tokenize, simply use the readWord method in my library).

  // Convert an English text to "Pig Latin"
  // Words may only contain letters
  // P.D. Terry, Rhodes University, 2004

  import Library.*;
  import java.util.*;

  class ToLatin1 {

    static String convert(String s) {
    // Convert s to Pig Latin - move first letter to end and then append "ay"
    // For example, "program" is returned as "rogrampay"
    // Words may only contain letters
      if (s.length() > 0) s = s.substring(1) + s.charAt(0) + "ay";
      return s;
    }

    public static void main (String[] args) {
    // first check that commandline arguments have been supplied
    // attempt to open data file
    // attempt to open results file
    // all this as in SampleIO.java - see full solution for details

    // read and process data file
      while (true) {
        String word = data.readWord();
        if (data.noMoreData()) break;
        results.write(convert(word));
        if (data.eol()) results.writeLine(); else results.write(' ');
      }

    // close results file safely
      results.close();
    }

  }

The corresponding program for decoding is essentially the same, with the convert method replaced by a deconvert method:

    static String deconvert(String s) {
    // Convert a Pig Latin word to English - check for the "ay" at the end
    // In this version the word proper is assumed to contain only letters
      int ay = s.lastIndexOf("ay");
      if (ay >= 1) s = s.charAt(ay-1) + s.substring(0, ay - 1);
      return s;
    }

While several people submitted solutions on these lines, few thought to check that the conversion or deconversion would actually be possible. Students, of course, are idealists and fondly believe that data will always be perfect and that nothing can ever go wrong - but it most certainly can, and by this stage of your careers you should be thinking all the time of how to write really reliable code.

A few submissions, I was pleased to say, had realized that "words" might not always be composed only of letters, but might be numbers, or be preceded or followed by punctuation marks:

He said: "I have found 100 mistakes in your wonderful textbook".

Situations like this are a bit harder to handle. Here are some possibilities that make use of the StringBuffer class, which you should learn about, as it is very useful for manipulating strings dynamically. I have deliberately left this code badly commented, just to drive home the point that reading someone else's uncommented code often takes considerable effort.

    static String convert(String s) {
    // Convert s to Pig Latin - move first letter to end and then append "ay"
    // For example, "program" is returned as "rogrampay"
    // Words may start and end with non-letters which remain there
    // For example "1234" is returned as 1234; "Hello!!" as "elloHay!!"
      StringBuffer sb = new StringBuffer();
      int i = 0, sl = s.length();
      while (i < sl && !Character.isLetter(s.charAt(i))) {
        sb.append(s.charAt(i)); i++;
      }
      if (i < sl) {
        char first = s.charAt(i);
        i++;
        while (i < sl && Character.isLetter(s.charAt(i))) {
          sb.append(s.charAt(i)); i++;
        }
        sb.append(first); sb.append("ay");
        while (i < sl) {
          sb.append(s.charAt(i)); i++;
        }
      }
      return sb.toString();
    }

    static String deconvert(String s) {
    // Convert a Pig Latin word to English - check for the "ay" at the end
    // In this version non-letters can precede and follow the word proper
      int ay = s.lastIndexOf("ay");
      if (ay < 0) return s;
      StringBuffer sb = new StringBuffer();
      int i = 0;
      while (i < ay - 1 && !Character.isLetter(s.charAt(i))) {
        sb.append(s.charAt(i)); i++;
      }
      sb.append(s.charAt(ay-1));
      sb.append(s.substring(i, ay - 1));
      int L = s.length();
      if (L - ay - 2 > 0) {
        sb.append(s.substring(ay + 2, L));
      }
      return sb.toString();
    }

Of course the situation is really even more complicated - there might be words with interior punctuation:

He said, unconvincingly, "That's the story of my life, my dear Gambol Hedge-Bette".

but the refinements needed to handle this are left as an exercise.

Task 13 - Internet/Janet address conversion

Once again this can be handled quite easily. Points missed were that a single conversion algorithm works for conversion in either direction, and that there might be any number of dot-separated components of the name and address. Once again the program should be able to handle input data that, strictly speaking, is incorrect, but few people thought of that.

  // Convert Internet  <---> Janet addresses
  // P.D. Terry, Rhodes University, 2004

  import java.util.*;
  import Library.*;

  class Internet {

    static String convert(String address) {
    // Returns opposite form of address
      int i = address.indexOf('@');
      String user = address.substring(0, i+1);
      address = address.substring(i+1);
      StringTokenizer tokens = new StringTokenizer(address, ".", true);
      address = "";
      while (tokens.hasMoreTokens())
        address = tokens.nextToken() + address;
      return user + address;
    }

    public static void main (String[] args) {
    // first check that commandline arguments have been supplied
    // attempt to open data file
    // attempt to open results file
    // all this as in SampleIO.java - see full solution for details

    // read and process data file
      while (true) {
        String address = data.readWord();
        if (data.noMoreData()) break;
        results.writeLine(convert(address));
      }

    // close results file safely
      results.close();
    }

  }

Okay, I'll 'fess up. When I first set this exercise I thought it might be fun to use the StringBuffer class and the Stack class, and came up with this:

    static String convert(String address) {
    // Returns opposite form of address
      Stack st = new Stack();
      StringBuffer sb = new StringBuffer();
      StringTokenizer tokens = new StringTokenizer(address, ".@", true);
      String t;
      do {
        t = tokens.nextToken(); sb.append(t);
      } while (tokens.hasMoreTokens() && !t.equals("@"));
      while (tokens.hasMoreTokens()) st.push(tokens.nextToken());
      while (!st.empty()) sb.append(st.pop());
      return sb.toString();
    }

but, as already mentioned, there is always a simpler way!