Computer Science 3 - 2001

Programming Language Translation


Practical for Week 15, beginning 30 July 2001

Hand in this prac sheet before lunch time on your next practical day, correctly packaged in a transparent folder with your solutions and the "cover sheet". Please do NOT come to a practical and spend the first hour printing or completing solutions from the previous week's exercises. Since the practical will have been done on a group basis, please hand in one copy of the cover sheet for each member of the group. These will be returned to you in due course, signed by the marker, and you will be asked to sign to acknowledge that you have received your own copy.


Objectives

In recent years it has become clear to me that many students lack confidence in C++ input, output and string handling, so this practical aims to provide a number of short exercises to acquire the confidence that we shall need later. Two of the exercises are directly "compiler related", however. An appendix to this prac sheet summarizes those parts of the ctype, stdio and string libraries that you will find yourselves using.

These exercises must be done in C++ using the stdio library, and not the iostreams library. As before, joint submissions are required, but please resist the temptation simply to divide up the various tasks. Discuss them thoroughly with your prac partners and with the demonstrators.

You will need this prac sheet, the handout defining the Clang language, and the notes on string handling in C/C++, with which you may not be familiar.

Copies of this prac sheet and of the Clang report are also available on the web site for the course.


To hand in:

This week you are required to hand in, besides this cover sheet:

Keep the prac sheet and your solutions until the end of the semester. Check carefully that your mark has been entered into the Departmental Records.

You are referred to the rules for practical submission which are clearly stated on page 13 of our Departmental Handbook. However, for this course pracs must be posted in the "hand-in" box in the secretary's office for collection by Pat Terry.

A rule not stated there, but which should be obvious, is that you are not allowed to hand in another student's work as your own. Attempts to do this will result in (at best) a mark of zero and (at worst) severe disciplinary action and the loss of your DP. You are allowed - even encouraged - to work and study with other students, but if you do this you are asked to acknowledge that you have done so.

The source files misc.h and set.h are again included in the prac kit PRAC15.ZIP (which you can copy and unpack in the usual way; you will not probably not need set.h). misc.h is defined as given below. The idea is that you simply #include misc.h in your programs and the system will then automagically include the headers for the rest of the C libraries that you need.

      // Various common items

      #ifndef MISC_H
      #define MISC_H

      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <stdarg.h>
      #include <ctype.h>
      #include <limits.h>

      #define  boolean  int
      #define  bool     int
      #define  true     1
      #define  false    0
      #define  TRUE     1
      #define  FALSE    0
      #define  maxint   INT_MAX

      #if __MSDOS__ || MSDOS || WIN32 || __WIN32__
      #  define  pathsep '\\'
      #else
      #  define  pathsep '/'
      #endif

      static void appendextension (char *oldstr, char *ext, char *newstr)
      // Changes filename in oldstr from PRIMARY.xxx to PRIMARY.ext in newstr
      { int i;
        char old[256];
        strcpy(old, oldstr);
        i = strlen(old);
        while ((i > 0) && (old[i-1] != '.') && (old[i-1] != pathsep)) i--;
        if ((i > 0) && (old[i-1] == '.')) old[i-1] = 0;
        if (ext[0] == '.') sprintf(newstr,"%s%s", old, ext);
          else sprintf(newstr, "%s.%s", old, ext);
      }

      #endif /* MISC_H */

Task 1 (source code listing must be submitted)

Write a very short program TASK1.CPP that simply reads a text file character by character from stdin and copies it exactly to stdout. As sample data there are provided three text files in the prac kit (which you should NOT try to edit)

POE.TXT 13331 bytes
TWAIN.TXT 13680 bytes
TABS.TXT 251 bytes

You should be able to run your program from a DOS prompt with a command using "redirection" like

TASK1 <POE.TXT >POE.NEW

and after the program has finished executing you should check that the copy is exactly the same as the original (do this for all three data files). You can do this by using the FC (file compare) command

FC POE.TXT POE.NEW

and follow this by looking at the directory entries

DIR POE.*

Task 2 (source code listing need not be submitted)

You should recall that if you declare your main function in C++ to be

void main (int argc, char *argv[])

then the operating system will arrange for the number of command line arguments to be retrievable from argc, and the text of the arguments themselves from the array argv (the first of these is the name of the program executable).

Rewrite the program in Task 1 so that it can still work as before, but if the name of the input file appears as a parameter on the command line, input will be taken from that file, and output written to a file with the same primary name, but the extension NEW. For example

TASK2 POE.TXT

should read POE.TXT and copy it to POE.NEW. (Alternatively, of course, the command

TASK2 <POE.TXT >POE.NEW

should still achieve the same effect.)

As before, check carefully that the copy of the file is identical to the original.

Task 3 (source code listing must be submitted)

Modify the program in Task 2 so that it also counts the characters (bytes) in the file as it copies them, and displays this count as the file length on the stderr output file (recollect that stderr is opened automagically to the screen). Does this count match the count you see in a DIR listing of your files?

Task 4 (source code listing need not be submitted)

Change the emphasis slightly. Modify the program of Task 2 to produce one that will read the original file a line at a time, rather than a character at a time. As before, make sure that the original and the copy are identical.

Task 5 (source code listing must be submitted)

Modify the program from Task 4 to produce one that will read the original file a line at a time, and will copy the input to the output with tab characters "expanded" to the next multiple of 8 characters. A suitable test program can be found in TABS.TXT; tab characters are rather difficult to enter into files with the standard version of QEdit, so don't try to edit TABS.TXT.

Task 6 (source code listing must be submitted)

At last we get onto problems rather more compiler-oriented. Write a program that will read (as data) a C++ program whose source file is either standard input (or is given as a command line parameter), and copy this to standard output (or to a file with the changed extension CCC), in the process stripping out all the comments. Of course, if you have been naughty you won't have any programs suitable for use as input (test) data, but if you have programmed wisely, you should have several.

/* Comments in C++ can be enclosed in these brackets */ but cannot be "nested". Alternatively they extend from // to the end of a line.

No, this is not as hard as it may seem. All you need is a variation on the program in Task 2 which applies some intelligence to deciding whether to copy the characters it reads to the output or not - when it detects the start of a comment it suspends the output process until it detects the end of the comment. However, take precautions, just in case some silly twit (professor?) asks you to strip comments from a file where a comment is "opened" and is never closed again (as in the annoying Clang program he gave you in Prac 14).

Task 7 (source code listing must be submitted)

As the most challenging of these exercises, develop a program that will help "beautify" Clang programs. To do a thorough job would be quite a tall order at this stage, so we'll keep it simple for the moment. Later in the course we might show you that with proper compiler tools a sophisticated system might be written in a matter of minutes, but for the moment an ad-hoc approach may sensitize you to the sorts of tasks that compilers deal with.

Clang programs are "case insensitive", so that it does not really matter if the various words and identifiers appear in upper or lower case, or in a mixture of the two. However, it really is much nicer if one uses a consistent typography. So write a program that will accept a Clang program and rewrite it with all the keywords in CAPITALS, and all the other identifiers in the same MixtureOfCase as that in which they first appeared. Perhaps a simple example will make this clearer: the program on the left is to be cleaned up to look like the program on the right.

    program   Silly;                      PROGRAM   Silly;
      Var first, Second, Var3;              VAR first, Second, Var3;
      Begin                                 BEGIN
        first := (seCOND + vaR3);             first := (Second + Var3);
        write('Value of first ', FIRST)       WRITE('Value of first', first)
      end.                                  END.

There are several ways in which this might be done. Here is one starting point:

Regard a Clang source program simply as a sequence of concatenated strings - alternate words and non-words. The first string in the above program is the word "program", the second is the string " " that separates "program" from "Silly", the third is the word "Silly", the fourth is the string starting with ";" and ending with the space just before "Var" and so on.

Given this insight, we can solve the problem by first developing a function that will obtain the "next" of these strings from the Clang source file each time it is called. The function might be defined to have a prototype like

    int fgetnextstr (char *str, FILE *stream);
    /* Reads next string from stream into str.
       Strings are of two kinds; the return value of the function
       distinguishes them from one another:
       Returns 1 if the string is a valid identifier or keyword (consists of
       an initial letter followed by other letters and digits only)
       Returns 2 if the string does not start with a letter, and contains no letters
       Returns 0 if the stream is exhausted (no further string could be extracted) */

The complete program can then be developed around the idea of a loop which simply calls this function repeatedly. Each time the "scanner" discovers an identifier or keyword, the corresponding string is compared (in a case insensitive way) with the entries in a table of words already known (this table is initialized to contain the Clang keywords in UPPERCASE). If a match is found, the spelling of the word in the table is copied to the output; if no match is found, the word is copied to the output unchanged, but is also added to the table.

This program has two features in common with a "real" compiler - it needs to be able to set up and interrogate a "symbol table", and it needs to be able to unpack ("scan") characters into "tokens". Keep the table handling simple - an array of strings will suffice, along with a simple linear search algorithm.

Comments and string literals in the source code being beautified tend to complicate things a little, because words in comments and strings might get confused with identifiers. For the purposes of this exercise you may simply assume initially that comments will never form part of any Clang program you are asked to analyse (see if you can handle string literals, though). Good Heavens! Who ever expected to find a comment in a student program anyway? For keen types, a bonus will be given to solutions that handle comments as well.

One last (first?) word of warning. In my experience student programs that attempt to handle strings in C++ are notoriously bug ridden, because students don't really understand how memory is allocated to strings. See if you can restore my confidence!

And do try to find simple, clean solutions!

CTYPE.H

  isalnum (c)    True if c is a letter or digit
  isalpha (c)    True if c is a letter
  isdigit (c)    True if c is a digit
  iscntrl (c)    True if c is a delete character or ordinary control character
  isascii (c)    True if c is a valid ASCII character
  isprint (c)    True if c is a printable character
  isgraph (c)    Like isprint except that the space character is excluded
  islower (c)    True if c is a lowercase letter
  isupper (c)    True if c is an uppercase letter
  ispunct (c)    True if c is a punctuation character
  isspace (c)    True if c is a space, tab, carriage return, newline, vertical tab, or form-feed
  isxdigit (c)   True if c is a hexadecimal digit
  toupper (c)    Converts c in the range [a-z] to characters [A-Z]
  tolower (c)    Converts c in the range [A-Z] to characters [a-z]
  toascii (c)    Converts c greater than 127 to the range 0-127 by clearing all but the lower 7 bits

STDIO.H

Input/output library for text and binary files (not all functions shown here, only the most common ones).

  int fclose (FILE *stream);
  /* Closes stream.
     If successful, returns 0.   If unsuccessful, returns EOF. */

  int feof (FILE *stream);
  /* Returns nonzero if end-of-file has been reached on stream. */

  int ferror (FILE *stream);
  /* Returns nonzero if an error has occurred on stream. */

  int fgetc (FILE *stream);
  /* Reads character (or EOF) from stream.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  int fgetchar (void);
  /* Reads a character (or EOF) from stdin.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  char *fgets (char *str, int n, FILE *stream);
  /* Reads a string str of at most n characters from stream.
     Collects input from stream until a newline character (\n) is found or at
     most n-1 characters are read. (If read, \n is placed in the string.)
     If successful, returns a pointer to the nul-terminated string str.
     If unsuccessful, returns NULL. */

  FILE *fopen (const char *filename, const char *mode);
  /* Opens stream in required mode to external filename.
     If successful, returns pointer to the newly opened stream.
     If unsuccessful, returns NULL. */

  int fprintf (FILE *stream, const char *format { , argument } );
  /* Sends formatted output to stream.
     Uses the same format specifiers as printf, but fprintf sends output to
     the specified stream.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int fputc (int c, FILE *stream);
  /* Writes character c to stream.
     If successful, returns c.
     If unsuccessful, returns EOF. */

  int fputchar (int c);
  /* Writes character c to stdout.
     If successful, returns c.
     If unsuccessful, returns EOF. */

  int fputs (const char *str, FILE *stream);
  /* Writes string str to stream.
     If successful, returns last character written.
     If unsuccessful, returns EOF. */

  int fscanf (FILE *stream, const char *format { , address } );
  /* Performs formatted input from stream.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value doesn't include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments. */

  int getc (FILE *stream);
  /* Reads character (or EOF) from stream.
     If successful, returns character.
     If unsuccessful, returns EOF. */

  int getchar (void);
  /* Reads character (or EOF) from stdin.
     If successful, returns the character read, after converting it to an
     int without sign extension.
     If unsuccessful, returns EOF. */

  char *gets (char *str);
  /* Reads string str from stdin.
     Collects input from stdin until a newline character (\n) is found.
     \n is not placed in the string.
     If successful, returns a pointer to the nul-terminated string str.
     If unsuccessful, returns NULL. */

  int printf (const char *format { , argument } );
  /* Formatted output to stdout.
     Processes a variable number of arguments according to the format,
     sending the output to stdout.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int putc (int c, FILE *stream);
  /* Writes character c to stream.
     If successful, returns the character c.
     If unsuccessful, returns EOF. */

  int putchar (int c);
  /* Writes character c on stdout.
     If successful, returns the character c.
     If unsuccessful, returns EOF. */

  int puts (const char *str);
  /* Writes string str to stdout (and appends a newline character).
     If successful, returns the last character written.
     If unsuccessful, returns EOF. */

  void rewind (FILE *stream);
  /* Repositions file pointer to stream's beginning. */

  int scanf (const char *format { , argument } );
  /* Performs formatted input from stdin.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value does not include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments. */

  int sprintf (char *buffer, const char *format { , argument } );
  /* Performs formatted output to a string buffer.
     If successful, returns the number of bytes output.
     If unsuccessful, returns EOF. */

  int sscanf (const char *buffer, const char *format { , address } );
  /* Performs formatted input from a string buffer.
     Returns the number of input fields successfully scanned, converted, and
     stored; return value does not include unstored scanned fields.
     Processes input according to the format and places the results in the
     memory locations pointed to by the arguments.
     If sscanf attempts to read past end of buffer, the return value is EOF. */

  int ungetc (int c, FILE *stream);
  /* Pushes the character c back into input stream, so that the next call to
     getc (or to other stream input functions) for stream will return c again.
     If successful, returns c.
     If unsuccessful, returns EOF. */

Predefined streams automatically opened when the program is started.

  stdin      Standard input device.
  stdout     Standard output device.
  stderr     Standard error output device.

Other predefined quantities

  FILE       File control structure for streams.
  NULL       Null pointer value.
  EOF        value of character returned when end-of-file is encountered.

STRING.H

  /* size_t is int or long, depending on the implementation. */

  char *strcpy (char *dest, const char *src);
  /* Copies string src to dest.  Returns dest. */

  char *strncpy (char *dest, const char *src, size_t n);
  /* Copies at most n chars from src to dest.  If n characters are copied, no null character
     is appended; the contents of the dest area is not a null-terminated string. */

  size_t strlen (const char *str);
  /* Returns length of str. */

  int strcmp (const char *s1, const char *s2);
  /* Compares one string to another, case significant
     Returns 0 if s1 = s2,  < 0 if s1 < s2, > 0 if s1 > s2. */

  int stricmp (const char *s1, const char *s2);  /* might be called strcasecmp */
  /* Compares one string to another, ignoring case
     Returns 0 if s1 = s2,  < 0 if s1 < s2, > 0 if s1 > s2. */

  char *strstr (const char *str, const char *substr);
  /* Returns pointer to first location of substr within str or NULL. */

  char *strcat (char *s1, const char *s2);
  /* Appends s2 to s1. */

  char *strncat (char *s1, const char *s2, size_t n);
  /* Appends not more than n chars from s2 to s1. */

STDLIB.H

  int abs (int x);
  /* Returns the absolute value of integer x. */

  int atexit (atexit_t func);
  /* Registers termination function.  Returns 0 on success and nonzero on failure. */

  double atof (const char *str);
  /* Converts string str to a floating point number.
     Returns the converted value of str, or 0 if str cannot be converted. */

  int atoi (const char *str);
  /* Converts string str to integer.  Returns the converted value of str.
     Returns the converted value of str, or 0 if str cannot be converted. */

  void exit (int status);
  /* Terminates program.  Defined values for status are
        EXIT_SUCCESS  Normal program termination
        EXIT_FAILURE  Abnormal program termination
     Before terminating, buffered output is flushed, files are closed, and exit functions are called. */

  void free (void *block);
  /* Frees block previously allocated by a call to malloc. */

  char *getenv (const char *name);
  /* Gets string from environment.
     Returns a pointer to the value associated with name, or NULL if name is not defined in
     the environment. */

  char *itoa (int value, char *str, int radix);
  /* Converts an integer value to a string.  Returns a pointer to the target string.
     For a decimal representation, use radix=10.  For hexadecimal, use radix=16. */

  char *ltoa (long value, char *str, int radix);
  /* Converts a long value to a string.  Returns a pointer to the target string.
     For a decimal representation, use radix=10.  For hexadecimal, use radix=16. */

  void *malloc (size_t size);
  /* Allocates memory.  size is in bytes.  Returns a pointer to the newly allocated block, or NULL if
     insufficient space exists for the new block.  If size == 0, it returns NULL. */

  int rand (void);
  /* Returns random number between 0 and RAND_MAX. */

  int random (int num);
  /* Returns a random integer between 0 and (num-1). */

  void randomize (void);
  /* Initializes the random number generator with a random value.  It uses the
     time function, so you should include time.h when using this routine. */


Home  © P.D. Terry