Computer Science 301 - 2000

Programming Language Translation


Practical for Week 20, beginning 4 September 2000

Hand in this prac sheet on your next practical day, correctly packaged in a transparent folder and with your solutions. This prac sheet forms the "cover sheet". Since the practical will have been done on a group basis, please hand in one copy of the prac sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.

Your surname: (PRINT CLEARLY)          Your prac day:

Your student number: (PRINT CLEARLY)

Your signature:                        Your group partners:


Other people with whom you collaborated (you need not give tutors' names):



Mark awarded:                          Tutor's signature


Objectives

Last year it became clear to me that many students lack confidence in C++ input, output and string handling, so this practical aims to provide a number of short exercises to acquire the confidence that we shall need later. One of the exercises is directly "compiler related", however. An appendix to this prac sheet summarizes those parts of the ctype, stdio and string libraries that you will find yourselves using.

I would greatly prefer you to do these exercises in C++ using the stdio library, and not in Pascal. As from next week you will have a true choice of language for your pracs. As before, joint submissions are required, but please resist the temptation simply to divide up the various tasks. Discuss them thoroughly with your prac partners and with the demonstrators.

You will need this prac sheet, the handout defining the Clang language, and the notes on string handling in C/C++, with which you may not be familiar.

Copies of this prac sheet and of the Clang report are also available on the web site for the course.


To hand in:

This week you are required to hand in, besides this cover sheet:

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 10 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box in the secretary's office for collection by Pat Terry.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so.

The source files misc.h and set.h are again included in the prac kit PRAC20.ZIP (which you can copy and unpack in the usual way; you will not actually need set.h). misc.h is defined as given below. The idea is that you simply #include misc.h in your programs and the system will then automagically include the headers for the rest of the C libraries that you need.

      // Various common items

      #ifndef MISC_H
      #define MISC_H

      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <stdarg.h>
      #include <ctype.h>
      #include <limits.h>

      #define  boolean  int
      #define  bool     int
      #define  true     1
      #define  false    0
      #define  TRUE     1
      #define  FALSE    0
      #define  maxint   INT_MAX

      #if __MSDOS__ || MSDOS || WIN32 || __WIN32__
      #  define  pathsep '\\'
      #else
      #  define  pathsep '/'
      #endif

      static void appendextension (char *oldstr, char *ext, char *newstr)
      // Changes filename in oldstr from PRIMARY.xxx to PRIMARY.ext in newstr
      { int i;
        char old[256];
        strcpy(old, oldstr);
        i = strlen(old);
        while ((i > 0) && (old[i-1] != '.') && (old[i-1] != pathsep)) i--;
        if ((i > 0) && (old[i-1] == '.')) old[i-1] = 0;
        if (ext[0] == '.') sprintf(newstr,"%s%s", old, ext);
          else sprintf(newstr, "%s.%s", old, ext);
      }

      #endif /* MISC_H */

Task 1 (source code listing must be submitted)

Write a very short program TASK1.CPP that simply reads a text file character by character from stdin and copies it exactly to stdout. As sample data are provided three text files in the prac kit (which you should NOT try to edit)

         POE.TXT      13331 bytes
         TWAIN.TXT    13680 bytes
         TABS.TXT     251 bytes

You should be able to run your program from a DOS prompt with a command like

TASK1 <POE.TXT >POE.NEW

and after the program has finished executing you should check that the copy is exactly the same as the original (do this for all three data files). You can do this by using the FC (file compare) command

FC POE.TXT POE.NEW

and follow this by looking at the directory entries

DIR POE.*

Task 2 (source code listing must be submitted)

You should recall that if you declare your main function in C++ to be

void main (int argc, char *argv[])

then the operating system will arrange for the number of command line arguments to be retrievable from argc, and the text of the arguments themselves from the array argv (the first of these is the name of the program executable).

Write a program TASK2.CPP that adds the numbers provided as command line arguments to give a very simple calculator. For example

TASK2 1 4 6 7 9

should display the result 27.

Task 3 (source code listing must be submitted)

Rewrite the program in Task 1 so that it can still work as before, but if the name of the input file appears as a parameter on the command line, input will be taken from that file, and output written to a file with the same primary name, but the extension NEW. For example

TASK3 POE.TXT

should read POE.TXT and copy it to POE.NEW. (Alternatively, of course, the command

TASK3 <POE.TXT >POE.NEW

should achieve the same effect.)

Hints:

As before, check carefully that the copy of the file is identical to the original.

Task 4 (source code listing must be submitted)

Modify the program in Task 3 so that it also counts the number of lines in the file as it copies them, and displays this count on the stderr output file (recollect that stderr is opened automagically to the screen).

Task 5 (source code listing must be submitted)

At last we get onto a more challenging problem that is rather more compiler-oriented.

Firstly, some background. By now you will have seen the use of QEdit as a "development environment". QEdit can work in conjucntion with various other programs so that when a hot key is pressed (actually CTRL-F9 or CTRL-F10 in our set up) then the combination of programs:

This makes for very easy development of user-friendly new compilers. The Clang compiler, for example, has been written to be the executable CLANG.EXE. A command of the form

CLANG bad.cln errors.lst

will invoke the compiler, and produce a file ERRORS.LST that might read

       bad.cln ( 1 , 1) 'PROGRAM' expected
       bad.cln ( 7 , 5) ';' expected
       bad.cln ( 10 , 16) number expected

CLANG.EXE is automagically invoked by pressing <CTRL+F9> from within QEdit if the source text being edited resides in a file with a .CLN extension. The toy assembler you used last year is an executable ASM.EXE and is automagically invoked by QEdit if the source text being invoked resides in a file with a .ASM extension, and so on.

Now we are not going to write a complete compiler this week (wait for it, we'll get there soon enough). But let's have a look at some key features of the things compilers do, by writing a program that

Get this program to work in conjunction with QEdit so that you can edit a Clang program and then after "compilation" step through the procedures and functions, and through the variable declarations. Here is a silly Clang program, to show the output we would wish to produce:

      PROGRAM   Silly;
        VAR
          first, Second,
          third;

        PROCEDURE Sum;
          BEGIN
            Write(first + second + third);
          END;

        FUNCTION Average (X, Y);
          BEGIN
            RETURN (X + Y) / 2;
          END;

        BEGIN
          READ (first, second);
          Sum;
          WRITE(Average(first, second))
        END.

     FILE.CLN  (  6, 3  ) PROCEDURE
     FILE.CLN  ( 11, 3  ) FUNCTION
     FILE.CLN  (  1, 11 ) Silly
     FILE.CLN  (  3, 5  ) First
     FILE.CLN  (  3, 12 ) Second
     FILE.CLN  (  4, 5  ) Third
     FILE.CLN  (  6, 13 ) Sum
     FILE.CLN  ( 11, 12 ) Average
     FILE.CLN  ( 11, 20 ) X
     FILE.CLN  ( 11, 23 ) Y

This problem probably looks impossibly difficult at first, second, third ... glance. But it isn't really. There are several ways in which it might be solved. Here is one possibility:

Regard a Clang source program simply as a sequence of concatenated strings - alternate words and non-words. The first string in the above program is the word "PROGRAM", the second is the string "    " that separates "PROGRAM" from "Silly", the third is the word "Silly", the fourth is the string starting with ";" and ending with the space just before "VAR" and so on.

Given this insight, we can solve the problem by first developing a function that will obtain the "next" of these strings from the Clang source file each time it is called. The function might be defined to have a prototype like

    int fgetnextstr (char *str, FILE *stream, int &line, int &column);
    /* Reads next string from stream into str and returns the line and column
       in which it appeared in the stream.
       Strings are of two kinds; the return value of the function
       distinguishes them from one another:
       Returns 1 if the string is a valid identifier or keyword (consists of
       an initial letter followed by other letters and digits only)
       Returns 2 if the string does not start with a letter, and contains no letters
       Returns 0 if the stream is exhausted (no further string could be extracted) */

The complete program can then be developed around the idea of a loop which simply calls this scanner function repeatedly. Each time the "scanner" discovers an identifier or keyword, the corresponding string is compared (in a case insensitive way) with the entries in a table of words already known (this table is initialized to contain the Clang keywords in UPPERCASE). If a match with PROCEDURE or FUNCTION is found, the position of the string is recorded on the output file; otherwise if it is a word that has not been declared before the word is added to the table and its position in the source text is recorded. Once all the text has been read, the table can be scanned and all the identifiers and their first positions recorded on the output file.

This program has two features in common with a "real" compiler - it needs to be able to set up and interrogate a "symbol table", and it needs to be able to unpack ("scan") characters into "tokens". Keep the table handling simple - an array of strings will suffice, along with a simple linear search algorithm.

Clang programs are "case insensitive", so that it does not really matter if the various words and identifiers appear in the original source in upper or lower case, or in a mixture of the two. Of course, the various spellings must all be regarded as equivalent by the tablehandler.

Comments tend to mess this sort of thing around. of course, because words in comments might get confused with variable and procedure names. For the purposes of this exercise simply assume that comments will never form part of any Clang program you are asked to analyse. (Good Heavens! Who ever expected to find a comment in a student program anyway!)

One last (first?) word of warning. In my experience student programs that attempt to handle strings in C++ are notoriously bug ridden, because students don't really understand how memory is allocated to strings. See if you can restore my confidence!


CTYPE.H

  isalnum (c)    True if c is a letter or digit
  isalpha (c)    True if c is a letter
  isdigit (c)    True if c is a digit
  iscntrl (c)    True if c is a delete character or ordinary control character
  isascii (c)    True if c is a valid ASCII character
  isprint (c)    True if c is a printable character
  isgraph (c)    Like isprint except that the space character is excluded
  islower (c)    True if c is a lowercase letter
  isupper (c)    True if c is an uppercase letter
  ispunct (c)    True if c is a punctuation character
  isspace (c)    True if c is a space, tab, carriage return, newline, vertical tab, or form-feed
  isxdigit (c)   True if c is a hexadecimal digit
  toupper (c)    Converts c in the range [a-z] to characters [A-Z]
  tolower (c)    Converts c in the range [A-Z] to characters [a-z]
  toascii (c)    Converts c greater than 127 to the range 0-127 by clearing all but the lower 7 bits

STDIO.H

Input/output library for text and binary files (not all functions shown here, only the most common ones).

  int fclose (FILE *stream);
  /* Closes stream.
     If successful, returns 0.   If unsuccessful, returns EOF. */

  int feof (FILE *stream);
  /* Returns nonzero if end-of-file has been reached on stream. */

  int ferror (FILE *stream);
  /* Returns nonzero if an error has occurred on stream. */

  int fgetc (FILE *stream);
  /* Reads character (or EOF) from stream.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  int fgetchar (void);
  /* Reads a character (or EOF) from stdin.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  char *fgets (char *str, int n, FILE *stream);
  /* Reads a string str of at most n characters from stream.
     Collects input from stream until a newline character (\n) is found or at
     most n-1 characters are read. (If read, \n is placed in the string.)
     If successful, returns a pointer to the nul-terminated string str.
     If unsuccessful, returns NULL. */

  FILE *fopen (const char *filename, const char *mode);
  /* Opens stream in required mode to external filename.
     If successful, returns pointer to the newly opened stream.
     If unsuccessful, returns NULL. */

  int fprintf (FILE *stream, const char *format { , argument } );
  /* Sends formatted output to stream.
     Uses the same format specifiers as printf, but fprintf sends output to
     the specified stream.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int fputc (int c, FILE *stream);
  /* Writes character c to stream.
     If successful, returns c.
     If unsuccessful, returns EOF. */

  int fputchar (int c);
  /* Writes character c to stdout.
     If successful, returns c.
     If unsuccessful, returns EOF. */

  int fputs (const char *str, FILE *stream);
  /* Writes string str to stream.
     If successful, returns last character written.
     If unsuccessful, returns EOF. */

  int fscanf (FILE *stream, const char *format { , address } );
  /* Performs formatted input from stream.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value doesn't include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments. */

  int getc (FILE *stream);
  /* Reads character (or EOF) from stream.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  int getchar (void);
  /* Reads character (or EOF) from stdin.
     If successful, returns the character read, after converting it to an
     int without sign extension.
     If unsuccessful, returns EOF. */

  char *gets (char *str);
  /* Reads string str from stdin.
     Collects input from stdin until a newline character (\n) is found.
     \n is not placed in the string.
     If successful, returns a pointer to the nul-terminated string str.
     If unsuccessful, returns NULL. */

  int printf (const char *format { , argument } );
  /* Formatted output to stdout.
     Processes a variable number of arguments according to the format,
     sending the output to stdout.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int putc (int c, FILE *stream);
  /* Writes character c to stream.
     If successful, returns the character c.
     If unsuccessful, returns EOF. */

  int putchar (int c);
  /* Writes character c on stdout.
     If successful, returns the character c.
     If unsuccessful, returns EOF. */

  int puts (const char *str);
  /* Writes string str to stdout (and appends a newline character).
     If successful, returns the last character written.
     If unsuccessful, returns EOF. */

  void rewind (FILE *stream);
  /* Repositions file pointer to stream's beginning. */

  int scanf (const char *format { , argument } );
  /* Performs formatted input from stdin.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value does not include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments. */

  int sprintf (char *buffer, const char *format { , argument } );
  /* Performs formatted output to a string buffer.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int sscanf (const char *buffer, const char *format { , address } );
  /* Performs formatted input from a string buffer.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value does not include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments.
     If sscanf attempts to read past end of buffer, the return value is EOF. */

  int ungetc (int c, FILE *stream);
  /* Pushes the character c back into input stream, so that the next call to
     getc (or to other stream input functions) for stream will return c again.
     If successful, returns c.
     If unsuccessful, returns EOF. */

Predefined streams automatically opened when the program is started.

  stdin      Standard input device.
  stdout     Standard output device.
  stderr     Standard error output device.

Other predefined quantities

  FILE       File control structure for streams.
  NULL       Null pointer value.
  EOF        value of character returned when end-of-file is encountered.

STRING.H

  /* size_t is int or long, depending on the implementation. */

  char *strcpy (char *dest, const char *src);
  /* Copies string src to dest.  Returns dest. */

  char *strncpy (char *dest, const char *src, size_t n);
  /* Copies at most n chars from src to dest.  If n characters are copied, no null character
     is appended; the contents of the dest area is not a null-terminated string. */

  size_t strlen (const char *str);
  /* Returns length of str. */

  int strcmp (const char *s1, const char *s2);
  /* Compares one string to another, case significant
     Returns 0 if s1 = s2,  < 0 if s1 < s2, > 0 if s1 > s2. */

  int stricmp (const char *s1, const char *s2);  /* might be called strcasecmp */
  /* Compares one string to another, ignoring case
     Returns 0 if s1 = s2,  < 0 if s1 < s2, > 0 if s1 > s2. */

  char *strstr (const char *str, const char *substr);
  /* Returns pointer to first location of substr within str or NULL. */

  char *strcat (char *s1, const char *s2);
  /* Appends s2 to s1. */

  char *strncat (char *s1, const char *s2, size_t n);
  /* Appends not more than n chars from s2 to s1. */

STDLIB.H

  int abs (int x);
  /* Returns the absolute value of integer x. */

  int atexit (atexit_t func);
  /* Registers termination function.  Returns 0 on success and nonzero on failure. */

  double atof (const char *str);
  /* Converts string str to a floating point number.
     Returns the converted value of str, or 0 if str cannot be converted. */

  int atoi (const char *str);
  /* Converts string str to integer.  Returns the converted value of str.
     Returns the converted value of str, or 0 if str cannot be converted. */

  void exit (int status);
  /* Terminates program.  Defined values for status are
        EXIT_SUCCESS  Normal program termination
        EXIT_FAILURE  Abnormal program termination
     Before terminating, buffered output is flushed, files are closed, and exit functions are called. */

  void free (void *block);
  /* Frees block previously allocated by a call to malloc. */

  char *getenv (const char *name);
  /* Gets string from environment.
     Returns a pointer to the value associated with name, or NULL if name is not defined in
     the environment. */

  char *itoa (int value, char *str, int radix);
  /* Converts an integer value to a string.  Returns a pointer to the target string.
     For a decimal representation, use radix=10.  For hexadecimal, use radix=16. */

  char *ltoa (long value, char *str, int radix);
  /* Converts a long value to a string.  Returns a pointer to the target string.
     For a decimal representation, use radix=10.  For hexadecimal, use radix=16. */

  void *malloc (size_t size);
  /* Allocates memory.  size is in bytes.  Returns a pointer to the newly allocated block, or NULL if
     insufficient space exists for the new block.  If size == 0, it returns NULL. */

  int rand (void);
  /* Returns random number between 0 and RAND_MAX. */

  int random (int num);
  /* Returns a random integer between 0 and (num-1). */

  void randomize (void);
  /* Initializes the random number generator with a random value.  It uses the
     time function, so you should include time.h when using this routine. */