Input/Output - Scanf and Printf

C`++` Input and Output using fscanf, scanf, fprintf and printf

Up till now you have probably done most of your I/O in C++ using the cin and cout streams. The Coco/R system used later in the course works in terms of traditional, less bloated libraries that are present both in C and in C++.

The standard C and C++ libraries include a number of functions for output to the "standard output" device, and for input from the "standard input" device. In MS-DOS implementations these correspond, in their simplest applications, to the screen and keyboard. Chief among these functions are printf (for output) and scanf (for input). They are not the only I/O functions one can use in C, but they are two of the most versatile. Calls to the functions may be described in a general way by

PrintfStatement = "printf" "(" ControlString { "," Expression } ")" ";" .
ScanfStatement = "scanf" "(" ControlString { "," Address } ")" ";" .
ControlString = String .

and some simple examples follow immediately:

         printf("Output results"); 
         printf("firstvar + secondvar = %d\n", thirdvar); 
         printf("%d + %d = %d\n", first, second, third); 

         scanf("%d %d %d", &first, &second, &third);

The chief difference between printf and scanf lies in the argument list. printf uses variable names, constants and expressions, whereas scanf uses the addresses of variables.

A simple rule for beginners starting to use scanf for input is that

To read a value for a variable of one of the basic types, form addresses by preceding the variable name with an ampersand &.
To read in a value for a string variable (that is, a character array), don't use an &.

The control string

The first actual parameter to either type of function call is a string - in most cases a string literal - that we call the control string. The simplest form of printf statement contains only this string. Within this string - if it is specified as a literal string - the usual "escape sequences" are interpreted as such. This for example, we can write

            printf("here is one line\nand here is another\nand another\007");

Format specifiers

However, there may be other parameters. In the case of printf these parameters give the values of expressions whose values are to be written; in the case of scanf these parameters specify the addresses (pointers to) the variables whose values are to be read. A device (first used in FORTRAN) requires that the control string have embedded in it so-called format specifiers or conversion specifiers. That is to say, the control string is not a simple string at all, but a mixture of characters that are to be displayed as themselves, and characters that are to be stripped out of the string and used to decide how to interpret the other parameters in the function call! This is aptly described by Plauger as effectively having a little program contained in the control string, written in its own little programming language. As though learning one new language were not enough, we are now suggesting you learn two more, for the languages in printf and scanf control strings are similar, but different, languages

The lead-in character for a format specifier is the escape character "%". The presence of this character in the control string means that as many as are necessary of the characters that follow are to be stripped out and turned into a format specifier - unless the next character is also a %. So to print the string

           you should appreciate that %age points are Brownie points

requires a printf statement

           printf("you should appreciate that %%age points are Brownie points");

The general form of the other format specifiers may be described by the EBNF productions:

     FormatSpecifier  =  "%" [ Modifiers ] TypeCharacter .
     Modifiers        =  PrintModifiers | ScanModifiers .

The trailing character in such a sequence, the so-called type character, is described by

     TypeCharacter    =
         "i" | "d"     /* decimal integer */
      |  "u"           /* unsigned decimal number */
      |  "o"           /* unsigned octal integer */
      |  "x" | "X"     /* unsigned hexadecimal integer */
      |  "e" | "E"     /* floating point number, exponential notation */
      |  "f"           /* floating point number, decimal notation */
      |  "g" | "G"     /* equivalent to the more compact of "f" or "e" */
      |  "c"           /* a single character */
      |  "s"           /* a character string */ .

and essentially specifies what type the corresponding parameter is taken to be. Perhaps this may be clarified by an example:

        float Average;  int Number;  char KeyCode;
        printf("The average is %e for the %d samples identified by %c", Average, Number, KeyCode);

printf Modifiers

A basic format specification can be modified by inserting so-called modifiers between the % and the type conversion character. We can specify the modifiers, the order of which is important, by a further EBNF production:

  PrintModifiers  =  { "-" | "+" | " " | "#" }
                     [ Number | "*" ]
                     [ "." ( Number | "*" ) ]
                     [ "l" | "L" ]

This is written out again below (highly commented).

  PrintModifiers  =
  {  "-"                    /*  The item will be printed "left justified",
                                beginning at the left of its field width (as
                                defined below).  Normally the item is printed
                                so that it ends at the right of its field.  An
                                example is %-10d */

   | "+"                    /*  The value is always displayed with a numeric
                                sign (only relevant for integers and floating
                                point values). */

   | " "                    /*  Positive values are displayed with a leading
                                space (only relevant for integers and floating
                                point values). */

   | "#"  }                 /*  Octal values will be preceded by a zero.
                                Hexadecimal values will be preceded by 0x.
                                Floating point values will always have a
                                decimal point. */

   [ "0" ]                  /*  Values are left padded with zeros, rather than
                                spaces. */

   [ Number | "*" ]         /*  The minimum field width.  A wider field will
                                be used if the printed number or string won't
                                fit in the field.  For example, %4d.  The
                                meaning of * is discussed below */

   [ "." ( Number | "*" ) ] /*  For float types, the number of digits
                                to be printed to the right of the decimal.
                                For integer types, the minimum number of
                                digits to be displayed (adding leading zeros
                                if required).  For character strings, the
                                maximum number of characters to be printed.
                                For example, %4.2f (2 decimal places in a
                                field 4 characters wide).  The meaning of * is
                                discussed below */

   [ "l" | "L" ]            /*  The corresponding data item is long
                                rather than int.  For example, %ld. */ .

As an illustration of the use of printf with fairly complex format specifiers together with their modifiers, consider the following:

    #include <stdio.h>
    #define BLURB "Outstanding acting"

    void main () {
      printf("/%-10.4d/\n", 36);       /* printing an integer */
      printf("/%10.3f/\n", 1234.56);   /* printing floating point */
      printf("/%10.3e/\n", 1234.56);   /* printing floating point */
      printf("/%-22.5s/\n", BLURB);    /* printing a string */
      printf("%x\n", 336);             /* hexadecimal */
    }

The output of this program would be: 

    /0036      /                       ( four digits displayed in total width 10 ) 
    /  1234.560/                       ( three digits after point in total width 10 ) 
    / 1.234E+03/                       ( exponential form displayed in total width 10 ) 
    /Outst                 /           ( only first five characters in total width 22 ) 
    150                                ( hexadecimal display in minimum width 3 )

The last printf statement printed a decimal constant as a hexadecimal string. The constant 336 was held in memory in two bytes. printf viewed these two bytes as a hexadecimal constant, and printed them out as such.

Points to watch when using printf

The general ideas behind using printf are quite easily understood, but there are a few points to watch:

As usual, in C, there is no insistence that the "type" of a type character and the "type" of the corresponding expression or variable need correspond, and no check is made at compile time that a programmer is doing the right thing.
There is not even a check that the number of format specifiers matches the number of further parameters - indeed, this is impossible to do at compile time, since the control string does not have to be a literal string, but can be computed at run time. If there are fewer specifiers than seem necessary, the remaining parameters will be ignored (that is, the function is "driven" by the number of % specifiers in the control string). If there are too many specifiers you get what you deserve (extraneous garbage output).
A parameter corresponding to a "%s" specifier is taken to be of type "char *". This means that it can have the appearance of an "array name" as in the following code
```
            char fossil [12] = "Terry"; 
            printf("%s" , fossil); 
```
but we can also write the equivalent code
```
            char * fossil = "Terry"; 
            printf("%s" , fossil); 
```
For the field width and precision settings the value can be specified at run-time by using an asterisk to denote that the value is given as a parameter. For example, the following program segment prints the value of the variable x in a field width specified by the value of width:
```
            printf("%*d", width, x); 
```
As we have been using it, it appears to be a "regular procedure", but printf is actually a function, returning an integer value. (Recall that any expression can act as a statement in C, and that some expressions have side effects!) The value returned is EOF (a constant "exported" from the header file stdio.h, and usually equivalent to -1) if some error occurred during processing, or the number of characters successfully written if nothing went wrong. (Things only go "wrong" for the purposes of this discussion when a low level driver fails - for example when trying to write to a non-existent device, or a full disk. Getting a mismatch between the "type character" and the type of the expression being written is regarded as perfectly permissible in C!)
Although all our examples have used a literal string to exemplify the control string, is is possible for this string to be read in (or constructed in some other way). Be aware of a possible misconception here: Suppose we have the following code
```
            char format [30]; 
            int i = 56, j = 678, k = 89; 
            scanf("%s", format); 
            printf(format, i, j, k); 
```
and present it with input data consisting of the string
```
            i=%5d\nj=%5d\nk=%5d 
```
The output will be
```
            i=   56\nj=  678\nk=   89 
```
and not
```
            i=   56 
            j=  678 
            k=   89 
```
That is, the characters "\n" that appear in the data are not interpreted as "newline" when the string is used as a control string. "\n" and other such escape sequences are source code representations of special characters, not data file representations of them, and are handled when a literal string is compiled, and not dynamically at run time. By contrast, the "%" characters must be presented in the data string.

The scanf control string

scanf also uses a control string. The simplest use of this function when used with several arguments uses white space between the format specifiers, implying that white space (blanks, tabs, newlines and spaces) will also be used to decide how to divide up the input into corresponding fields. That is, the values the user types in can be separated by any number of blanks, tabs or carriage returns.

Thus code like

            int n, m; 
            char c1, c2; 
            char string[12]; 
            scanf("%d %s %c %c %d", &n, string, &c1, &c2, &m);

corresponds closely to less opaque Modula-2 code of the form below (some calls to SkipSpaces need not be coded explicitly):

            (* SkipSpaces; *) ReadInt(n); 
            (* SkipSpaces; *) ReadWord(string); 
            SkipSpaces; ReadChar(c1); 
            SkipSpaces; ReadChar(c2); 
            (* SkipSpaces; *) ReadInt(m);

So when presented with input like

            345  Hello  a x      45

the effect will be to assign values of 345 and 45 to the integers n and m, the string "Hello" to the array string, and the characters 'a' and 'x' to the character variables c1 and c2. Note that the "%s" format specifier reads a white space delimited "word", and not the remainder of a text line.

Things get more exciting if the format specifiers are written without the intervening white space. Had the scanf call been written

            scanf("%d%s%c%c%d", &n, string, &c1, &c2, &m);

the intention would have been more like

            (* SkipSpaces; *) ReadInt(n); 
            (* SkipSpaces; *) ReadWord(string); 
            ReadChar(c1); 
            ReadChar(c2); 
            (* SkipSpaces; *) ReadInt(m);

The main difference lies in the behaviour of the %c specifier, which will now read in the next character regardless if it is a white space character. %s still requires white space to signal the end of the string.

So when presented with input like

            345  Hello a 45

the effect will be to assign values of 345 and 45 to the integers n and m, the string "Hello" to the array string, and the characters ' ' and 'x' to the character variables c1 and c2. c1 is assigned the space that followed (and, in a sense, terminated) the word Hello. The effect would have been the same had the data been

            345Hello a45

It is possible to have non-white space characters between the specifiers as well. This allows scanf to expect, read, accept, but then ignore these characters when they appear between (after) the input data. This provides a very powerful feature in some applications.

For example, had the statement been coded as

            scanf("%d,%s %c...%c?%d", &n, string, &c1, &c2, &m);

data like

            345,  Hello a...b?  45

would have been deemed "acceptable" since the 345 is correctly followed immediately by a comma, the 'a' is correctly followed by the ellipsis "...", and the 'b' is correctly followed immediately by a "?" before the final 45.

The action that scanf takes when it reaches an ordinary character in the control string thus depends on whether it is a white space character (space, tab, line feed) or not.

When scanf encounters one or more white space characters in the control string, the action is repeatedly to read white space characters from the input, until reaching a non-space character, which is then buffered, ready to be processed further.
When scanf encounters a non-white-space character in the control string, it compares it with the next character in the data. If the characters match the input character is discarded, and processing continues with the next character. If the characters do not match, the character is buffered (put back).

scanf Modifiers

There is often no need for a scanf control string to include characters other than simple format specifiers. As in the case of printf, the specifiers may include optional modifiers:

  ScanModifiers  =
      [ "*" ]       /*   The item will be read, but the value will be
                         discarded and not assigned - useful for skipping over
                         unwanted fields in a data file */

    | [ Number ]    /*   A non-zero number giving the maximum field width to
                         be scanned - useful when reading long data strings
                         into short arrays. Leading white- space is not
                         reckoned into the total */

    | [ "h" | "l" ] /*   The corresponding data item is "short" or "long"
                         rather than simply "int" or "float" */ .

Points to watch when using scanf

The general ideas behind using scanf are quite easily understood, but there are a few points to watch:

Care must be taken to supply the parameters as "addresses", usually requiring them to be preceded with the "address of" operator &.
As always, there is no insistence that the "type" of a type character and the "type" of the variable at the corresponding address need correspond, or even, perhaps, that an address is correctly specified, and no check is made at compile time that a programmer is doing the right thing.
There is no check that the number of format specifiers matches the number of further parameters.
A parameter corresponding to a "%s" specifier is of type "char *". This means that it can have the appearance of an "array name" as in the following code
```
            char fossil [12]; 
            scanf("%s" , fossil); 
```
If one attempts to use the "%s" specifier, unguarded by a limiting width restrictor, complete and utter chaos can result if the string read in is too long to be stored. Of course, if you are the sort of programmer who likes your programs simply to overwrite themselves with data, then C++ should be just up your street! In the above example one would be extremely foolish not to have used code like
```
            char fossil [12]; 
            scanf("%11s" , fossil); 
```
Like printf, scanf is actually an integer function. The value that it returns is the number of format specifiers correctly processed, or EOF if the data runs out prematurely, so it is quite easy to check for input errors, just as one should always do (but often omits to do).

Putting the newline character "\n" at the end of a format string does not necessarily cause scanf to advance to the start of a new line. To scanf, a newline in a format string is simply a "white space" character, like ' ' or '\t' - all these cause scanf to advance to the next non-white space character.

A final example may help to round off the discussion:

    /* A nosy, informative program */
    #include <stdio.h>
    #include <string.h>
    #define DENSITY 0.9 /* human density in kg per cubic metre */
    void main () {
      float weight, volume; int size, letters; char name[40];
      printf("Hi, What's your first name?\n");  scanf("%s", name);
      printf("%s, What's your mass in Kilograms?\n", name);  scanf("%f", &weight);
      size = sizeof(name);
      letters = strlen(name);
      volume = weight / DENSITY;
      printf("Well, %s, your volume is %2.2f cubic metres.\n", name, volume);
      printf("Also, your first name has %d letters,\n", letters);
      printf("and we used %d bytes to store it in.\n", size);
    }

Other simple input and output functions

A few other I/O functions of great use, if only because they are simpler, and probably faster than using scanf and printf (in situations when they can be applied) are as follows.

The function getchar() reads the next single character from the standard input. When reading from the keyboard, <enter> is returned as '\n'. If the end of file is reached, the value returned is EOF.
When using this function on many operating systems, input from the keyboard is "buffered"; only when <enter> is typed is the keyboard buffer released to the program. This gives one the facility to "edit" the keyboard input before passing it to the program, but is unsuitable for some interactive programs that must respond to single character key-presses immediately.
The function gets(s) reads characters and stores them in the array s until it encounters the end of line, which is accepted, discarded, but not stored as part of the string - the usual NUL terminator is stored instead. Like the unprotected %s conversion in scanf, this one is dangerous if the input string is too long! The function returns a NIL pointer (NULL) if an error occurs, and a pointer to the string otherwise.
The function putchar(ch) appends the single character ch onto the standard output file and returns the same character as the value of the function (of course, putchar is usually invoked as a "regular" procedure call!).
The function puts(s) writes the string s onto the standard output file, and then appends a line mark (that is, equivalent to WriteString(s); WriteLn) and, since it may be of fascinating interest, returns the last character so written.
The function getche() (imported from conio.h) is MS-DOS specific and reads the next single character from standard input, echoing it to the screen. When reading from the keyboard, <enter> is returned as '\r'. Input is not "buffered", so this function is suitable in applications where getchar is unsuitable.
The function getch() (also imported from conio.h) reads the next single character from the keyboard without echoing it (useful for applications like accepting passwords).
The MS-DOS specific function putch(ch) (imported from conio.h) displays the single character ch on the screen.

General File I/O

For many applications we need to have more than two files connected to a program. Clearly this calls for the provision of more general I/O library functions; not surprisingly, standardised versions of these are also provided, effectively "imported" from stdio, and, fortunately, are rather similar to those we have already seen.

One needs to be able to declare variables of an abstract "file" type. The stdio library effectively declares such a type with the name FILE (all in upper case). In C, as you know, one does not have to mention all the "imported" identifiers when one uses a library (they all come for free, as does the chaos when you get name clashes!). So declaration of file variables is done in C++ with code like

            FILE * infile, * outfile;

where the *, as is probably now fairly obvious, relates to the fact that we are really talking of pointer types.

Having declared the logical files with which the program is going to work, one needs to associate them with physical files stored on the disk. This is done with calls to the fopen function:

            infile = fopen("FILE.DAT", "r"); 
            outfile = fopen("RESULTS", "w");

These examples associate the logical file infile with the physical file FILE.DAT in the current directory on the current disk drive, and the logical file outfile with the physical (disk) file RESULTS. The second (string) parameter given for fopen is an indication of how the file is to be opened:

"r" implies "open for reading only" "w" implies "open for writing only, starting at the beginning of the file" "a" implies "open for writing only, starting at the end of the file"

One can also use combinations of the access modes, such as "rw" if the file is to be opened for both reading and writing. If the file cannot be opened for some reason then fopen returns NULL (the NIL pointer) - conscientious programmers will always check for such disasters, of course!

All C++ programs automatically have three files opened as they start to run. These are given the names stdout, stdin and stderr, referring to the standard output (usually the screen), standard input (usually the keyboard) and the standard error file (with read and write access, usually the screen and keyboard).

Having opened the files one needs to be able to access them. For each of the I/O functions we have looked at so far there are alternative forms that take an extra parameter specifying the file that is to be used. Those corresponding to printf and scanf may be described in the usual EBNF notation

     PrintfStatement =  "fprintf" "( FileId, " ControlString { "," Expression } ")" ";" .
     ScanfStatement  =  "fscanf" "(" FileId, ControlString { "," Address } ")" ";" .
     FileId          =  Identifier .

For example:

    fprintf(outfile, "%s is %d years old\n", name, age);

Since scanf uses the standard file stdin, and printf uses the standard file stdout we have the following equivalences:

    printf (ControlString, Param1, ...)  ==  fprintf(stdout, ControlString, Param1, ...) 
    scanf (ControlString, Param1, ...)   ==  fscanf(stdin, ControlString, Param1, ...)

Not surprisingly, we also have functions corresponding to getchar, gets, putchar and puts. Two of these are very closely analogous:

     getc(FileId)                   returns the next character (byte) in the
                                    file identified as FileId, or EOF if the
                                    file is exhausted.

     putc(ch, FileId)               appends the character (byte) ch onto the
                                    file identified as FileId

The string handling ones are not quite so analogous

     fgets(string, length, FileId)  reads at most length bytes from FileId
                                    into the specified string, or stops at a
                                    linemark, whichever happens first.  Unlike
                                    the simpler gets function, any line mark
                                    read is stored in string.

     fputs(string, FileId)          appends the specified string to FileId.
                                    Unlike the simpler puts function, it does
                                    not append an extra line mark (of course,
                                    if a linemark is already present in
                                    string, this is appended to the file).

Note also that in these procedures the FileId parameter comes last, not first, as it does in fprintf and fscanf.

When a program has finished with a file it should close it, although this often "fails safe", as a C++ system should close all its files automatically when the program terminates. Explicitly closing files is done using the fclose function, exemplified by

            fclose(infile);

The reader will probably not be surprised to learn that fclose returns a value of EOF if the file cannot be closed.

C++ Input and Output using fscanf, scanf, fprintf and printf