The discussion of executable sizes was not carried right through by everyone, but most students seemed to get the idea. For the record, here are the sizes:
Executable sizes derived from sources shown: Using a "set" Using an "array" EXE OBJ EXE OBJ Turbo Pascal: SIEVE.PAS 3504 SIEVE1.PAS 2960 JPI Modula-2 SIEVE.MOD 21282 2025 SIEVE1.MOD 21220 1960 Using cin and cout: Borland C 3.1: SIEVE.CPP 14932 1745 SIEVE1.CPP 14788 1463 Borland C 5.5: SIEVE.CPP 149504 4546 SIEVE1.CPP 149504 4260 Microsoft C: SIEVE.CPP 40960 SIEVE1.CPP 40960 Using scanf and printf: Borland C 3.1: SIEVE.CPP 10352 1558 SIEVE1.CPP 10192 1260 Borland C 5.5: SIEVE.CPP 66560 1806 SIEVE1.CPP 66048 1527 Microsoft C: SIEVE.CPP 37768 SIEVE1.CPP 37768 Using scanf and printf after translation from Modula-2 by X2C Borland C 3.1: SIEVE.C 30614 1417 SIEVE1.C 30630 1429 Borland C 5.5: SIEVE.C 62976 1590 SIEVE1.C 62976 1583 Microsoft C: SIEVE.C 45056 SIEVE1.C 45056
It is quite clear that the C++ compilers, certainly when using the iostream library, produce enormous code bloat. I am pretty sure that these differences are due mostly to the extent to which I/O library implementation is layered. To that end, I experimented further. Null programs like these:
PROGRAM p; MODULE p; void main(void) { BEGIN END p. } END.
generated executable sizes of 1472 (Pascal), 1773 (Modula-2), 4204 (Borland 3.1 C++) and 47104 (Borland 5.5 C++).
I performed a further experiment, stripping out all the I/O statements from the "array" version of the sieve programs. Executable sizes were 1680 (Pascal), 1932 (Modula-2 with array range checks), 1874 (Modula-2 with no array range checks), 6378 (Borland 3.1 C++) and 47104 (Borland 5.5 C++). Clearly the Pascal and Modula-2 compilers are much tighter, and perform very well (only adding a few hundred bytes to the "null" program base sizes).
The Borland 5.5 and Microsoft C++ compilers are 32-bit ones, rather than 16-bit ones. But even allowing for this, they suffer from bizarre code bloat for small applications. There may be command line parameters and options that one can set to try to produce tighter code, but I have not bothered to experiment further. Some (most) of the overheads may relate to the fact that they have to produces "Windows" compatible programs. Compilers like this are clearly designed with an "everyone has 128MB of memory and 6GB of disk space, and if they don't they should go and buy more" philosophy.
Several students complained that the C code generated by the X2C system was "unreadable", and "not what we would have written ourselves". There can be no dispute with the second of those arguments, but if you take a careful look at the generated C code, it is verbose, rather than unreadable, because long identifier names have been used. This is actually not such a bad thing - at least one can tell the "origin" of any identifier, since the original module in which it was declared is incorporated into its name, and is a very useful style, both for large programs, and (more especially) when one has to contend with the miserable rules that C has for controlling identifier name spaces. In fact, in the days whe I used Modula regularly, I used a convention similar to this of my own accord, and carried it over into my C coding, as you will see from the source in the appendices to the book. Some of the other "unreadability" presumably relates to the fact that the X2C system is obliged to translate the CARDINAL type to the unsigned int type, which is not one many of you will ever have used - this explains all those funny casts and capital U's that you saw.
Similarly, the discussion of "limits on the size of the sieve" was not taken right through by many people. Several discovered that the Pascal "set" size is limited to 256 elements (which is very restrictive for this conceptually very useful data structure). The JPI Modula-2 compiler allows very large sets, because, like the set.h implementation, it represents sets by packing 8 elements to a byte. The Pascal, Modula and (I suspect) Borland 3.1 compilers had to work within an architectural limitation of earlier processors running under DOS that no data structure could exceed 64K bytes in size. While this gave the illusion that one could set up "array" based sieves of very nearly this size, in practice there is another nasty limitation, which some of you discovered and others probably missed. In 16 bit compilers the range of an int or INTEGER was limited to -32768 .. 32768 and an unsigned int (or CARDINAL in Modula-2 terminology) to 0 .. 65355. Attempts to drive the sieve larger are handicapped by this. In the case of the Pascal and Modula programs one soon gets "array index out of range" errors. In the case of the Borland 3.1 C++ system one gets what one deserves for driving without brakes - rubbish (at best) and a total collapse (at worst). On the other hand the 32 bit compilers allow very much bigger sieves as the int type has a range of about 4 billion, and the operating system and large memories in the computers support larger data structures. With a memory size of about 64M and an int size of 4 bytes I expect the system would handle arrays of up to about 8MB with no problem, and several students reported this (I am not brave or perhaps foolish enough to try pushing my own system to its limits using the inferior protection provided by C++ and Windows, sorry. I once lost a hard disk because a pointer went out of range unchecked. Not a nice situation to be in!) On the other hand, the limitation imposed by the Clang system is entirely due tot he fact that the interpreter system only allows for 8000 words of memory - which has to hold the pseudo-code and all variables. The limit on the sieve size was about 7900 I think. This could have been extended simply by modifiying the interpreter, and then recompiling it, but you have not seen how to do that (yet).
Code follows for the Sieve programs in C++ using scanf/printf:
Note that the program includes an exit(1) call to abort itself if it discovers that the input data is out of range. Develop the habit of doing this sort of thing and making these sorts of checks. C compilers don't (can't) generate code to apply array bound checks, so if you write array handling programs that go out of bounds you have only yourself to blame when spectacular disasters occur as soon as your buggy programs modify memory in odd places!
// Sieve of Eratosthenes for finding primes 2 <= n <= 4000 // P.D. Terry, Rhodes University, 2001 #include "misc.h" const int Max = 4000; typedef boolean SIEVES[Max - 1]; void main(void) { int i, n, k; // counters long primes = 0; SIEVES Uncrossed; // the sieve printf("Supply largest number to be tested "); scanf("%d", &n); if (n > Max) { printf("n too large, sorry"); exit(1); } printf("Prime numbers between 2 and %d\n", n); printf("-----------------------------------\n"); for (i = 2; i <= n; i++) Uncrossed[i-2] = true; // clear sieve for (i = 2; i <= n; i++) { // the passes over the sieve if (Uncrossed[i-2]) { if (primes % 10 == 0) putchar('\n'); // ensure line not too long primes++; printf("%5d", i); // now cross out multiples of i k = i; do { Uncrossed[k-2] = false; k += i; } while (k <= n); } } exit(0); }
// Sieve of Eratosthenes for finding primes 2 <= n <= 4000 // P.D. Terry, Rhodes University, 2001 #include "misc.h" #include "set.h" const int Max = 4000; typedef Set<Max> SIEVES; void main(void) { int i, n, k; // counters long primes = 0; SIEVES Uncrossed; // the sieve printf("Supply largest number to be tested "); scanf("%d", &n); if (n > Max) { printf("n too large, sorry"); exit(1); } printf("Prime numbers between 2 and %d\n", n); printf("-----------------------------------\n"); Uncrossed.fill(); for (i = 2; i <= n; i++) { // the passes over the sieve if (Uncrossed.memb(i)) { if (primes % 10 == 0) putchar('\n'); // ensure line not too long primes++; printf("%5d", i); // now cross out multiples of i k = i; do { Uncrossed.excl(k); k += i; } while (k <= n); } } exit(0); }
The lucky number program was rather badly done by many people. For a start, you had been asked to write a function that passed the list back in an array parameter - some groups had simply hacked together a main routine only. The exercise was supposed to heighten your awareness of array handling and parameter passing, and so in that respect you may have lost out; in which case study the code below carefully.
Several solutions submitted produced lists with completely the wrong numbers, although perhaps they might have "looked" okay. Others did a huge amount of extra work, marking an array, then sweeping it all over again to extract numbers into a second array, then sweeping again to copy back into the original array.
Underneath every apparently complex problem there is often a simple solution. Seeking simplicity and order is what science is all about, so try to get into the habit. Unfortunately, the modern trend towards cut-and-paste interactive environments encourages a hack-the-code-any-old-how mentality!
// Simple test bed program for checking on the generation of lucky numbers // P.D. Terry, Rhodes University, 2001 #include "misc.h" #include <iostream.h> const int Max = 4000; typedef int LISTS[Max]; void GenerateLuckyNumbers(int LuckyList[], int &Top) // Generate a LuckyList of the lucky numbers between 1 and Top (inclusive) { int I, Copied, Step = 2; if (Top > Max) Top = Max; // check for potential disaster for (I = 1; I <= Top; I++) // set up initial list LuckyList[I-1] = I; while (Step <= Top) { // sweep and copy down elements Copied = 0; for (I = 1; I <= Top; I++) { if (I % Step != 0) { // only copy every Step-th element Copied++; // but count how many are copied LuckyList[Copied-1] = LuckyList[I-1]; } } Top = Copied; // new limit on the length of list Step++; } } void main(void) { int I, Limit; LISTS TestList; cout << "Supply limit of numbers to be tested for luck "; cin >> Limit; GenerateLuckyNumbers(TestList, Limit); for (I = 1; I <= Limit; I++) { cout.width(5); cout << TestList[I-1]; } exit(0); }
Here is a Pascal solution, for interest:
PROGRAM Lucky; (* Simple test bed program for checking on the generation of lucky numbers P.D. Terry, Rhodes University, 2001 *) CONST Max = 1000; TYPE LISTS = ARRAY [1 .. Max] OF INTEGER; PROCEDURE GenerateLuckyNumbers (VAR LuckyList : LISTS; VAR Top : INTEGER); (* Generate a LuckyList of the lucky numbers between 1 and Top (inclusive *) VAR I, Step, Copied : INTEGER; BEGIN IF Top > Max THEN Top := Max; (* check for potential disaster *) FOR I := 1 TO Top DO (* set up initial list *) LuckyList[I] := I; Step := 2; WHILE Step <= Top DO BEGIN (* sweep and copy down elements *) Copied := 0; FOR I := 1 TO Top DO BEGIN IF I MOD Step <> 0 THEN (* only copy every Step-th element *) BEGIN INC(Copied); (* but count how many are copied *) LuckyList[Copied] := LuckyList[I] END END; Top := Copied; (* new limit on the length of list *) INC(Step) END; END; VAR I, Limit : INTEGER; TestList : LISTS; BEGIN Write('Supply limit of numbers to be tested for luck '); Read(Limit); GenerateLuckyNumbers(TestList, Limit); FOR I := 1 TO Limit DO BEGIN Write(TestList[I]:5) END END.
PROGRAM Luck1; (* Simple test bed program for checking on the generation of lucky numbers P.D. Terry, Rhodes University, 2001 *) CONST Max = 1000; FUNCTION GenerateLuckyNumbers (LuckyList[], Top); (* Generate a LuckyList of the lucky numbers between 1 and Top (inclusive *) VAR I, Step, Copied; BEGIN IF Top > Max THEN Top := Max; (* check for potential disaster *) I := 1; WHILE I <= Top DO (* initial list *) BEGIN LuckyList[I] := I; I := I + 1 END; Step := 2; WHILE Step <= Top DO BEGIN (* sweep and copy down elements *) Copied := 0; I := 1; WHILE I <= Top DO BEGIN IF I / Step * Step <> I THEN (* I MOD Step = 0 *) BEGIN (* only copy every Step-th element *) Copied := Copied + 1; (* but count how many are copied *) LuckyList[Copied] := LuckyList[I] END; I := I + 1; END; Top := Copied; (* new limit on the length of list *) Step := Step + 1 END; RETURN Top; END; VAR I, Limit, TestList[1000]; BEGIN Write('Supply limit of numbers to be tested for luck '); Read(Limit); Limit := GenerateLuckyNumbers(TestList, Limit); I := 1; WHILE I <= Limit DO BEGIN Write(TestList[I]); I := I + 1 END END.
The sorting program clearly caused much confusion. I was delighted to see that most people knew where to look in their excellent second year course notes to find the algorithms. Unfortunately the version of the insertion sort you found did not translate easily, as it used an AND operator in the control of a WHILE loop, which the present version of Clang does not support. Sadly not everyone managed to find a clean (or indeed, any) solution to this problem. Consequently, various versions of the insertion sort are given below, as well as two versions of the bubble sort.
The Insertion sort can be made very simple if you work "downwards" rather than "upwards", and use List[0] as a sentinel (see Insert1Sort below). This allows one to avoid the potential AND operation altogether - and still to stay safely within the bounds of the array at all times. Of course, this relies on using array elements List[1] ... List[N] to store the N elements of the list rather than List[0] ... List[N-1], but that is a small price to pay, and, perhaps, makes all the algorithms simpler in any case.
A feature of many programs submitted was a cavalier approach to declaring far too many variables "global". Try to declare variables as locally as possible.
Clearly several people have little idea of how and why one must use semaphores to protect critical resources (like the overall counter, and, indeed, access to the printer).
Finally, the level of commenting in most of your code leaves a lot to be desired. One can overcomment simple code, but there should always be some sort of indication for each routine of what it is intended to do, expressed in terms of its parameters. And code should always have your names buried in it at the top.
PROGRAM CompareSorts; (* Compare speed of sorting by using several sorts in parallel; also count number of array element assignments performed P.D. Terry, Rhodes University, 2001 *) VAR Assignments, Mutex, Printer (* Semaphore *); PROCEDURE Increment(x); (* Protected increment of global variable Assignments *) BEGIN WAIT(Mutex); Assignments := Assignments + x; SIGNAL(Mutex) END; PROCEDURE WriteList(List[], N); (* Display N elements in List[1] ... List[N] *) VAR I; BEGIN I := 1; WHILE I <= N DO BEGIN Write(List[I]); I := I + 1 END; END; PROCEDURE BubbleSort (List [], N); (* Sort List[1] ... List[N] into ascending order *) VAR I, J, T; BEGIN I := 1; WHILE I <= N-1 DO BEGIN J := I + 1; WHILE J <= N DO BEGIN IF List[I] > List[J] THEN BEGIN T := List[I]; List[I] := List[J]; List[J] := T; Increment(3); END; J := J + 1 END; I := I + 1 END; WAIT(Printer); Write('BubbleSort completed'); SIGNAL(Printer); END; (* The "better" bubblesort achieves much better performance in some cases - in particular when the data is highly ordered to begin with. It attempts to cut down the effective length of list to scan as much as possible on each pass *) PROCEDURE BetterBubbleSort (List [], N); (* Sort List[1] ... List[N] into ascending order *) VAR I, Length, LastChange, T; BEGIN Length := N; WHILE Length > 1 DO BEGIN LastChange := 1 (* optimistic *); I := 1; WHILE I <= Length - 1 DO BEGIN IF List[I] > List[I+1] THEN BEGIN T := List[I]; List[I] := List[I+1]; List[I+1] := T; Increment(3); LastChange := I END; I := I + 1; END; Length := LastChange END; WAIT(Printer); Write('BetterBubbleSort completed'); SIGNAL(Printer); END; (* The following variation on the insertion algorithm uses List[0] very effectively as a sentinel, making the WHILE loop condition as simple as possible *) PROCEDURE Insert1Sort (List [], N); (* Sort List[1] ... List[N] into ascending order *) VAR I, J; BEGIN I := 2; WHILE I <= N DO BEGIN List[0] := List[I]; Increment(1); J := I - 1; WHILE List[0] < List[J] DO BEGIN List[J+1] := List[J]; Increment(1); J := J - 1; END; List[J+1] := List[0]; Increment(1); I := I + 1 END; WAIT(Printer); Write('Insert1Sort completed'); SIGNAL(Printer); END; (* The next variation demonstrates a classic "state variable" approach to handling the WHILE loop *) PROCEDURE Insert2Sort (List [], N); (* Sort List[1] ... List[N] into ascending order *) CONST TRUE = 1; FALSE = 0; VAR I, J, Temp, Searching; BEGIN I := 2; WHILE I <= N DO BEGIN Temp := List[I]; Increment(1); J := I - 1; Searching := TRUE; WHILE Searching = TRUE DO BEGIN IF Temp >= List[J] THEN Searching := FALSE; IF Temp < List[J] THEN BEGIN List[J+1] := List[J]; Increment(1); J := J - 1; IF J = 0 THEN Searching := FALSE; END END; List[J+1] := TEMP; Increment(1); I := I + 1 END; WAIT(Printer); Write('Insert2Sort completed'); SIGNAL(Printer); END; (* The next alternative tries to overcome the lack of an AND operator by using a (nested) BOOLEAN function which can produce the correct "short circuit" effect by clever use of the RETURN statement *) PROCEDURE Insert3Sort (List [], N); (* Sort List[1] ... List[N] into ascending order *) CONST TRUE = 1; FALSE = 0; FUNCTION MustShift (J, Temp); BEGIN IF J = 0 THEN RETURN FALSE; IF Temp > List[J] THEN RETURN FALSE; RETURN TRUE END; VAR I, J, Temp, Searching; BEGIN I := 2; WHILE I <= N DO BEGIN Temp := List[I]; Increment(1); J := I - 1; WHILE MustShift(J, Temp) = TRUE DO BEGIN List[J+1] := List[J]; Increment(1); J := J - 1; END; List[J+1] := TEMP; Increment(1); I := I + 1 END; WAIT(Printer); Write('Insert3Sort completed'); SIGNAL(Printer); END; FUNCTION EqualList (L1[], L2[], N); (* Return TRUE iff L1[I] = L2[I] for all I in 1 ... N *) CONST FALSE = 0; TRUE = 1; VAR I, Equal; BEGIN Equal := TRUE; I := 1; WHILE I <= N DO BEGIN IF L1[I] <> L2[I] THEN Equal := FALSE; I := I + 1; END; RETURN Equal END; FUNCTION InOrder (L1[], N); (* Return TRUE iff L1[I] < L1[I+1] for all I in 1 ... N-1 *) CONST FALSE = 0; TRUE = 1; VAR I, Okay; BEGIN I := 1; WHILE I < N DO BEGIN IF L1[I] > L1[I+1] THEN RETURN FALSE; I := I + 1; END; RETURN TRUE END; (* Note how we declare the rest of the global variables as close as possible to the point where they are "used" *) VAR List1[500], List2[500], List3[500], List4[500], List5[500], N, Data; BEGIN Printer := 1; Mutex := 1; Assignments := 0; N := 0; Read(Data); WHILE Data <> 0 DO BEGIN N := N + 1; IF N > 500 THEN BEGIN Write('Too much data'); RETURN END; List1[N] := Data; List2[N] := Data; List3[N] := Data; List4[N] := Data; List5[N] := Data; Read(Data) END; (* we must supply each routine with its own copy of the list, as arrays are passed by reference *) COBEGIN Insert1Sort(List1, N); Insert2Sort(List2, N); Insert3Sort(List3, N); BubbleSort(List4, N); BetterBubbleSort(List5, N) COEND; Write(Assignments, ' assignments'); Write('EqualList = ', EqualList(List1, List2, N), 'Inorder = ', InOrder(List1, N)); END.