C++ Input and Output using fscanf, scanf, fprintf and printf

Up till now you have probably done most of your I/O in C++ using the cin and cout streams. The Coco/R system used later in the course works in terms of traditional, less bloated libraries that are present both in C and in C++.

The standard C and C++ libraries include a number of functions for output to the "standard output" device, and for input from the "standard input" device. In MS-DOS implementations these correspond, in their simplest applications, to the screen and keyboard. Chief among these functions are printf (for output) and scanf (for input). They are not the only I/O functions one can use in C, but they are two of the most versatile. Calls to the functions may be described in a general way by

PrintfStatement = "printf" "(" ControlString { "," Expression } ")" ";" .
ScanfStatement = "scanf" "(" ControlString { "," Address } ")" ";" .
ControlString = String .

and some simple examples follow immediately:

         printf("Output results"); 
         printf("firstvar + secondvar = %d\n", thirdvar); 
         printf("%d + %d = %d\n", first, second, third); 

         scanf("%d %d %d", &first, &second, &third); 

The chief difference between printf and scanf lies in the argument list. printf uses variable names, constants and expressions, whereas scanf uses the addresses of variables.

A simple rule for beginners starting to use scanf for input is that

The control string

The first actual parameter to either type of function call is a string - in most cases a string literal - that we call the control string. The simplest form of printf statement contains only this string. Within this string - if it is specified as a literal string - the usual "escape sequences" are interpreted as such. This for example, we can write

            printf("here is one line\nand here is another\nand another\007"); 

Format specifiers

However, there may be other parameters. In the case of printf these parameters give the values of expressions whose values are to be written; in the case of scanf these parameters specify the addresses (pointers to) the variables whose values are to be read. A device (first used in FORTRAN) requires that the control string have embedded in it so-called format specifiers or conversion specifiers. That is to say, the control string is not a simple string at all, but a mixture of characters that are to be displayed as themselves, and characters that are to be stripped out of the string and used to decide how to interpret the other parameters in the function call! This is aptly described by Plauger as effectively having a little program contained in the control string, written in its own little programming language. As though learning one new language were not enough, we are now suggesting you learn two more, for the languages in printf and scanf control strings are similar, but different, languages

The lead-in character for a format specifier is the escape character "%". The presence of this character in the control string means that as many as are necessary of the characters that follow are to be stripped out and turned into a format specifier - unless the next character is also a %. So to print the string

           you should appreciate that %age points are Brownie points 

requires a printf statement

           printf("you should appreciate that %%age points are Brownie points"); 

The general form of the other format specifiers may be described by the EBNF productions:

     FormatSpecifier  =  "%" [ Modifiers ] TypeCharacter .
     Modifiers        =  PrintModifiers | ScanModifiers .

The trailing character in such a sequence, the so-called type character, is described by

     TypeCharacter    =
         "i" | "d"     /* decimal integer */
      |  "u"           /* unsigned decimal number */
      |  "o"           /* unsigned octal integer */
      |  "x" | "X"     /* unsigned hexadecimal integer */
      |  "e" | "E"     /* floating point number, exponential notation */
      |  "f"           /* floating point number, decimal notation */
      |  "g" | "G"     /* equivalent to the more compact of "f" or "e" */
      |  "c"           /* a single character */
      |  "s"           /* a character string */ .

and essentially specifies what type the corresponding parameter is taken to be. Perhaps this may be clarified by an example:

        float Average;  int Number;  char KeyCode;
        printf("The average is %e for the %d samples identified by %c", Average, Number, KeyCode);

printf Modifiers

A basic format specification can be modified by inserting so-called modifiers between the % and the type conversion character. We can specify the modifiers, the order of which is important, by a further EBNF production:

  PrintModifiers  =  { "-" | "+" | " " | "#" }
                     [ Number | "*" ]
                     [ "." ( Number | "*" ) ]
                     [ "l" | "L" ]

This is written out again below (highly commented).

  PrintModifiers  =
  {  "-"                    /*  The item will be printed "left justified",
                                beginning at the left of its field width (as
                                defined below).  Normally the item is printed
                                so that it ends at the right of its field.  An
                                example is %-10d */

   | "+"                    /*  The value is always displayed with a numeric
                                sign (only relevant for integers and floating
                                point values). */

   | " "                    /*  Positive values are displayed with a leading
                                space (only relevant for integers and floating
                                point values). */

   | "#"  }                 /*  Octal values will be preceded by a zero.
                                Hexadecimal values will be preceded by 0x.
                                Floating point values will always have a
                                decimal point. */

   [ "0" ]                  /*  Values are left padded with zeros, rather than
                                spaces. */

   [ Number | "*" ]         /*  The minimum field width.  A wider field will
                                be used if the printed number or string won't
                                fit in the field.  For example, %4d.  The
                                meaning of * is discussed below */

   [ "." ( Number | "*" ) ] /*  For float types, the number of digits
                                to be printed to the right of the decimal.
                                For integer types, the minimum number of
                                digits to be displayed (adding leading zeros
                                if required).  For character strings, the
                                maximum number of characters to be printed.
                                For example, %4.2f (2 decimal places in a
                                field 4 characters wide).  The meaning of * is
                                discussed below */

   [ "l" | "L" ]            /*  The corresponding data item is long
                                rather than int.  For example, %ld. */ .

As an illustration of the use of printf with fairly complex format specifiers together with their modifiers, consider the following:

    #include <stdio.h>
    #define BLURB "Outstanding acting"

    void main () {
      printf("/%-10.4d/\n", 36);       /* printing an integer */
      printf("/%10.3f/\n", 1234.56);   /* printing floating point */
      printf("/%10.3e/\n", 1234.56);   /* printing floating point */
      printf("/%-22.5s/\n", BLURB);    /* printing a string */
      printf("%x\n", 336);             /* hexadecimal */
    }

The output of this program would be: 

    /0036      /                       ( four digits displayed in total width 10 ) 
    /  1234.560/                       ( three digits after point in total width 10 ) 
    / 1.234E+03/                       ( exponential form displayed in total width 10 ) 
    /Outst                 /           ( only first five characters in total width 22 ) 
    150                                ( hexadecimal display in minimum width 3 ) 

The last printf statement printed a decimal constant as a hexadecimal string. The constant 336 was held in memory in two bytes. printf viewed these two bytes as a hexadecimal constant, and printed them out as such.

Points to watch when using printf

The general ideas behind using printf are quite easily understood, but there are a few points to watch:

The scanf control string

scanf also uses a control string. The simplest use of this function when used with several arguments uses white space between the format specifiers, implying that white space (blanks, tabs, newlines and spaces) will also be used to decide how to divide up the input into corresponding fields. That is, the values the user types in can be separated by any number of blanks, tabs or carriage returns.

Thus code like

            int n, m; 
            char c1, c2; 
            char string[12]; 
            scanf("%d %s %c %c %d", &n, string, &c1, &c2, &m); 

corresponds closely to less opaque Modula-2 code of the form below (some calls to SkipSpaces need not be coded explicitly):

            (* SkipSpaces; *) ReadInt(n); 
            (* SkipSpaces; *) ReadWord(string); 
            SkipSpaces; ReadChar(c1); 
            SkipSpaces; ReadChar(c2); 
            (* SkipSpaces; *) ReadInt(m); 

So when presented with input like

            345  Hello  a x      45 

the effect will be to assign values of 345 and 45 to the integers n and m, the string "Hello" to the array string, and the characters 'a' and 'x' to the character variables c1 and c2. Note that the "%s" format specifier reads a white space delimited "word", and not the remainder of a text line.

Things get more exciting if the format specifiers are written without the intervening white space. Had the scanf call been written

            scanf("%d%s%c%c%d", &n, string, &c1, &c2, &m); 

the intention would have been more like

            (* SkipSpaces; *) ReadInt(n); 
            (* SkipSpaces; *) ReadWord(string); 
            ReadChar(c1); 
            ReadChar(c2); 
            (* SkipSpaces; *) ReadInt(m); 

The main difference lies in the behaviour of the %c specifier, which will now read in the next character regardless if it is a white space character. %s still requires white space to signal the end of the string.

So when presented with input like

            345  Hello a 45 

the effect will be to assign values of 345 and 45 to the integers n and m, the string "Hello" to the array string, and the characters ' ' and 'x' to the character variables c1 and c2. c1 is assigned the space that followed (and, in a sense, terminated) the word Hello. The effect would have been the same had the data been

            345Hello a45 

It is possible to have non-white space characters between the specifiers as well. This allows scanf to expect, read, accept, but then ignore these characters when they appear between (after) the input data. This provides a very powerful feature in some applications.

For example, had the statement been coded as

            scanf("%d,%s %c...%c?%d", &n, string, &c1, &c2, &m); 

data like

            345,  Hello a...b?  45 

would have been deemed "acceptable" since the 345 is correctly followed immediately by a comma, the 'a' is correctly followed by the ellipsis "...", and the 'b' is correctly followed immediately by a "?" before the final 45.

The action that scanf takes when it reaches an ordinary character in the control string thus depends on whether it is a white space character (space, tab, line feed) or not.

scanf Modifiers

There is often no need for a scanf control string to include characters other than simple format specifiers. As in the case of printf, the specifiers may include optional modifiers:

  ScanModifiers  =
      [ "*" ]       /*   The item will be read, but the value will be
                         discarded and not assigned - useful for skipping over
                         unwanted fields in a data file */

    | [ Number ]    /*   A non-zero number giving the maximum field width to
                         be scanned - useful when reading long data strings
                         into short arrays. Leading white- space is not
                         reckoned into the total */

    | [ "h" | "l" ] /*   The corresponding data item is "short" or "long"
                         rather than simply "int" or "float" */ .

Points to watch when using scanf

The general ideas behind using scanf are quite easily understood, but there are a few points to watch:


Other simple input and output functions

A few other I/O functions of great use, if only because they are simpler, and probably faster than using scanf and printf (in situations when they can be applied) are as follows.


General File I/O

For many applications we need to have more than two files connected to a program. Clearly this calls for the provision of more general I/O library functions; not surprisingly, standardised versions of these are also provided, effectively "imported" from stdio, and, fortunately, are rather similar to those we have already seen.

One needs to be able to declare variables of an abstract "file" type. The stdio library effectively declares such a type with the name FILE (all in upper case). In C, as you know, one does not have to mention all the "imported" identifiers when one uses a library (they all come for free, as does the chaos when you get name clashes!). So declaration of file variables is done in C++ with code like

            FILE * infile, * outfile; 

where the *, as is probably now fairly obvious, relates to the fact that we are really talking of pointer types.

Having declared the logical files with which the program is going to work, one needs to associate them with physical files stored on the disk. This is done with calls to the fopen function:

            infile = fopen("FILE.DAT", "r"); 
            outfile = fopen("RESULTS", "w"); 

These examples associate the logical file infile with the physical file FILE.DAT in the current directory on the current disk drive, and the logical file outfile with the physical (disk) file RESULTS. The second (string) parameter given for fopen is an indication of how the file is to be opened:

"r" implies "open for reading only" "w" implies "open for writing only, starting at the beginning of the file" "a" implies "open for writing only, starting at the end of the file"

One can also use combinations of the access modes, such as "rw" if the file is to be opened for both reading and writing. If the file cannot be opened for some reason then fopen returns NULL (the NIL pointer) - conscientious programmers will always check for such disasters, of course!

All C++ programs automatically have three files opened as they start to run. These are given the names stdout, stdin and stderr, referring to the standard output (usually the screen), standard input (usually the keyboard) and the standard error file (with read and write access, usually the screen and keyboard).

Having opened the files one needs to be able to access them. For each of the I/O functions we have looked at so far there are alternative forms that take an extra parameter specifying the file that is to be used. Those corresponding to printf and scanf may be described in the usual EBNF notation

     PrintfStatement =  "fprintf" "( FileId, " ControlString { "," Expression } ")" ";" .
     ScanfStatement  =  "fscanf" "(" FileId, ControlString { "," Address } ")" ";" .
     FileId          =  Identifier .

For example:

    fprintf(outfile, "%s is %d years old\n", name, age); 

Since scanf uses the standard file stdin, and printf uses the standard file stdout we have the following equivalences:

    printf (ControlString, Param1, ...)  ==  fprintf(stdout, ControlString, Param1, ...) 
    scanf (ControlString, Param1, ...)   ==  fscanf(stdin, ControlString, Param1, ...) 

Not surprisingly, we also have functions corresponding to getchar, gets, putchar and puts. Two of these are very closely analogous:

     getc(FileId)                   returns the next character (byte) in the
                                    file identified as FileId, or EOF if the
                                    file is exhausted.

     putc(ch, FileId)               appends the character (byte) ch onto the
                                    file identified as FileId

The string handling ones are not quite so analogous

     fgets(string, length, FileId)  reads at most length bytes from FileId
                                    into the specified string, or stops at a
                                    linemark, whichever happens first.  Unlike
                                    the simpler gets function, any line mark
                                    read is stored in string.

     fputs(string, FileId)          appends the specified string to FileId.
                                    Unlike the simpler puts function, it does
                                    not append an extra line mark (of course,
                                    if a linemark is already present in
                                    string, this is appended to the file).

Note also that in these procedures the FileId parameter comes last, not first, as it does in fprintf and fscanf.

When a program has finished with a file it should close it, although this often "fails safe", as a C++ system should close all its files automatically when the program terminates. Explicitly closing files is done using the fclose function, exemplified by

            fclose(infile); 

The reader will probably not be surprised to learn that fclose returns a value of EOF if the file cannot be closed.