Last modified on 21 June 2014, at 13:56

C Programming/File IO

Previous: Standard libraries Index Next: Beginning exercises

IntroductionEdit

The stdio.h header declares a broad assortment of functions that perform input and output to files and devices such as the console. It was one of the earliest headers to appear in the C library. It declares more functions than any other standard header and also requires more explanation because of the complex machinery that underlies the functions.

The device-independent model of input and output has seen dramatic improvement over the years and has received little recognition for its success. FORTRAN II was touted as a machine-independent language in the 1960s, yet it was essentially impossible to move a FORTRAN program between architectures without some change. In FORTRAN II, you named the device you were talking to right in the FORTRAN statement in the middle of your FORTRAN code. So, you said READ INPUT TAPE 5 on a tape-oriented IBM 7090 but READ CARD to read a card image on other machines. FORTRAN IV had more generic READ and WRITE statements, specifying a logical unit number (LUN) instead of the device name. The era of device-independent I/O had dawned.

Peripheral devices such as printers still had fairly strong notions about what they were asked to do. And then, peripheral interchange utilities were invented to handle bizarre devices. When cathode-ray tubes came onto the scene, each manufacturer of consoles solved problems such as console cursor movement in an independent manner, causing further headaches.

It was into this atmosphere that Unix was born. Ken Thompson and Dennis Ritchie, the developers of Unix, deserve credit for packing any number of bright ideas into the operating system. Their approach to device independence was one of the brightest.

The ANSI C <stdio.h> library is based on the original Unix file I/O primitives but casts a wider net to accommodate the least-common denominator across varied systems.

StreamsEdit

Input and output, whether to or from physical devices such as terminals and tape drives, or whether to or from files supported on structured storage devices, are mapped into logical data streams, whose properties are more uniform than their various inputs and outputs. Two forms of mapping are supported: text streams and binary streams.

A text stream consists of one or more lines. A line in a text stream consists of zero or more characters plus a terminating new-line character. (The only exception is that in some implementations the last line of a file does not require a terminating new-line character.) Unix adopted a standard internal format for all text streams. Each line of text is terminated by a new-line character. That's what any program expects when it reads text, and that's what any program produces when it writes text. (This is the most basic convention, and if it doesn't meet the needs of a text-oriented peripheral attached to a Unix machine, then the fix-up occurs out at the edges of the system. Nothing in between needs to change.) The string of characters that go into, or come out of a text stream may have to be modified to conform to specific conventions. This results in a possible difference between the data that go into a text stream and the data that come out. For instance, in some implementations when a space-character precedes a new-line character in the input, the space character gets removed out of the output. In general, when the data only consists of printable characters and control characters like horizontal tab and new-line, the input and output of a text stream are equal.

Compared to a text stream, a binary stream is pretty straight forward. A binary stream is an ordered sequence of characters that can transparently record internal data. Data written to a binary stream shall always equal the data that gets read out under the same implementation. Binary streams, however, may have an implementation-defined number of null characters appended to the end of the stream. There are no further conventions which need to be considered.

Nothing in Unix prevents the program from writing arbitrary 8-bit binary codes to any open file, or reading them back unchanged from an adequate repository. Thus, Unix obliterated the long-standing distinction between text streams and binary streams.

Standard StreamsEdit

When a C program starts its execution the program automatically opens three standard streams named stdin, stdout, and stderr. These are attached for every C program.

The first standard stream is used for input buffering and the other two are used for output. These streams are sequences of bytes.

Consider the following program:

 /* An example program. */
 int main()
 {
     int var;
     scanf ("%d", &var); /* use stdin for scanning an integer from keyboard. */
     printf ("%d", var); /* use stdout for printing a character. */
     return 0;
 }
 /* end program. */

By default stdin points to the keyboard and stdout and stderr point to the screen. It is possible under Unix and may be possible under other operating systems to redirect input from or output to a file or both.

FILE pointersEdit

The <stdio.h> header contains a definition for a type FILE (usually via a typedef) which is capable of processing all the information needed to exercise control over a stream, including its file position indicator, a pointer to the associated buffer (if any), an error indicator that records whether a read/write error has occurred, and an end-of-file indicator that records whether the end of the file has been reached.

It is considered bad manners to access the contents of FILE directly unless the programmer is writing an implementation of <stdio.h> and its contents. Better access to the contents of FILE is provided via the functions in <stdio.h>. It can be said that the FILE type is an early example of object-oriented programming.

Opening and Closing FilesEdit

To open and close files, the <stdio.h> library has three functions: fopen, freopen, and fclose.

Opening FilesEdit

 #include <stdio.h>
 FILE *fopen(const char *filename, const char *mode);
 FILE *freopen(const char *filename, const char *mode, FILE *stream);

fopen and freopen opens the file whose name is in the string pointed to by filename and associates a stream with it. Both return a pointer to the object controlling the stream, or if the open operation fails a null pointer. The error and end-of-file indicators are cleared, and if the open operation fails error is set. freopen differs from fopen in that the file pointed to by stream is closed first when already open and any close errors are ignored.

mode for both functions points to a string consisting of one of the following sequences:

r           open a text file for reading
w           truncate to zero length or create a text file for writing
a           append; open or create text file for writing at end-of-file
rb          open binary file for reading
wb          truncate to zero length or create a binary file for writing
ab          append; open or create binary file for writing at end-of-file
r+          open text file for update (reading and writing)
w+          truncate to zero length or create a text file for update
a+          append; open or create text file for update
r+b or rb+  open binary file for update (reading and writing)
w+b or wb+  truncate to zero length or create a binary file for update
a+b or ab+  append; open or create binary file for update

Opening a file with read mode ('r' as the first character in the mode argument) fails if the file does not exist or cannot be read.

Opening a file with append mode ('a' as the first character in the mode argument) causes all subsequent writes to the file to be forced to the then-current end-of-file, regardless of intervening calls to the fseek function. In some implementations, opening a binary file with append mode ('b' as the second or third character in the above list of mode arguments) may initially position the file position indicator for the stream beyond the last data written, because of null character padding.

When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output may not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input may not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

When opened, a stream is fully buffered if and only if it can be determined not to refer to an interactive device.

Closing FilesEdit

 #include <stdio.h>
 int fclose(FILE *stream);

The fclose function causes the stream pointed to by stream to be flushed and the associated file to be closed. Any unwritten buffered data for the stream are delivered to the host environment to be written to the file; any unread buffered data are discarded. The stream is disassociated from the file. If the associated buffer was automatically allocated, it is deallocated. The function returns zero if the stream was successfully closed or EOF if any errors were detected.

Other file access functionsEdit

The fflush functionEdit

 #include <stdio.h>
 int fflush(FILE *stream);

If stream points to an output stream or an update stream in which the most recent operation was not input, the fflush function causes any unwritten data for that stream to be deferred to the host environment to be written to the file. The behavior of fflush is undefined for input stream.

If stream is a null pointer, the fflush function performs this flushing action on all streams for which the behavior is defined above.

The fflush functions returns EOF if a write error occurs, otherwise zero.

The reason for having a fflush function is because streams in C can have buffered input/output; that is, functions that write to a file actually write to a buffer inside the FILE structure. If the buffer is filled to capacity, the write functions will call fflush to actually "write" the data that is in the buffer to the file. Because fflush is only called every once in a while, calls to the operating system to do a raw write are minimized.

The setbuf functionEdit

 #include <stdio.h>
 void setbuf(FILE *stream, char *buf);

Except that it returns no value, the setbuf function is equivalent to the setvbuf function invoked with the values _IOFBF for mode and BUFSIZ for size, or (if buf is a null pointer) with the value _IONBF for mode.

The setvbuf functionEdit

 #include <stdio.h>
 int setvbuf(FILE *stream, char *buf, int mode, size_t size);

The setvbuf function may be used only after the stream pointed to by stream has been associated with an open file and before any other operation is performed on the stream. The argument mode determines how the stream will be buffered, as follows: _IOFBF causes input/output to be fully buffered; _IOLBF causes input/output to be line buffered; _IONBF causes input/output to be unbuffered. If buf is not a null pointer, the array it points to may be used instead of a buffer associated by the setvbuf function. (The buffer must have a lifetime at least as great as the open stream, so the stream should be closed before a buffer that has automatic storage duration is deallocated upon block exit.) The argument size specifies the size of the array. The contents of the array at any time are indeterminate.

The setvbuf function returns zero on success, or nonzero if an invalid value is given for mode or if the request cannot be honored.

Functions that Modify the File Position IndicatorEdit

The stdio.h library has five functions that affect the file position indicator besides those that do reading or writing: fgetpos, fseek, fsetpos, ftell, and rewind.

The fseek and ftell functions are older than fgetpos and fsetpos.

The fgetpos and fsetpos functionsEdit

 #include <stdio.h>
 int fgetpos(FILE *stream, fpos_t *pos);
 int fsetpos(FILE *stream, const fpos_t *pos);

The fgetpos function stores the current value of the file position indicator for the stream pointed to by stream in the object pointed to by pos. The value stored contains unspecified information usable by the fsetpos function for repositioning the stream to its position at the time of the call to the fgetpos function.

If successful, the fgetpos function returns zero; on failure, the fgetpos function returns nonzero and stores an implementation-defined positive value in errno.

The fsetpos function sets the file position indicator for the stream pointed to by stream according to the value of the object pointed to by pos, which shall be a value obtained from an earlier call to the fgetpos function on the same stream.

A successful call to the fsetpos function clears the end-of-file indicator for the stream and undoes any effects of the ungetc function on the same stream. After an fsetpos call, the next operation on an update stream may be either input or output.

If successful, the fsetpos function returns zero; on failure, the fsetpos function returns nonzero and stores an implementation-defined positive value in errno.

The fseek and ftell functionsEdit

 #include <stdio.h>
 int fseek(FILE *stream, long int offset, int whence);
 long int ftell(FILE *stream);

The fseek function sets the file position indicator for the stream pointed to by stream.

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. Three macros in stdio.h called SEEK_SET, SEEK_CUR, and SEEK_END expand to unique values. If the position specified by whence is SEEK_SET, the specified position is the beginning of the file; if whence is SEEK_END, the specified position is the end of the file; and if whence is SEEK_CUR, the specified position is the current file position. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier call to the ftell function on the same stream and whence shall be SEEK_SET.

The fseek function returns nonzero only for a request that cannot be satisfied.

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file; for a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

If successful, the ftell function returns the current value of the file position indicator for the stream. On failure, the ftell function returns -1L and stores an implementation-defined positive value in errno.

The rewind functionEdit

 #include <stdio.h>
 void rewind(FILE *stream);

The rewind function sets the file position indicator for the stream pointed to by stream to the beginning of the file. It is equivalent to

(void)fseek(stream, 0L, SEEK_SET)

except that the error indicator for the stream is also cleared.

Error Handling FunctionsEdit

The clearerr functionEdit

 #include <stdio.h>
 void clearerr(FILE *stream);

The clearerr function clears the end-of-file and error indicators for the stream pointed to by stream.

The feof functionEdit

 #include <stdio.h>
 int feof(FILE *stream);

The feof function tests the end-of-file indicator for the stream pointed to by stream and returns nonzero if and only if the end-of-file indicator is set for stream, otherwise it returns zero.

The ferror functionEdit

 #include <stdio.h>
 int ferror(FILE *stream);

The ferror function tests the error indicator for the stream pointed to by stream and returns nonzero if and only if the error indicator is set for stream, otherwise it returns zero.

The perror functionEdit

 #include <stdio.h>
 void perror(const char *s);

The perror function maps the error number in the integer expression errno to an error message. It writes a sequence of characters to the standard error stream thus: first, if s is not a null pointer and the character pointed to by s is not the null character, the string pointed to by s followed by a colon (:) and a space; then an appropriate error message string followed by a new-line character. The contents of the error message are the same as those returned by the strerror function with the argument errno, which are implementation-defined.

Other Operations on FilesEdit

The stdio.h library has a variety of functions that do some operation on files besides reading and writing.

The remove functionEdit

 #include <stdio.h>
 int remove(const char *filename);

The remove function causes the file whose name is the string pointed to by filename to be no longer accessible by that name. A subsequent attempt to open that file using that name will fail, unless it is created anew. If the file is open, the behavior of the remove function is implementation-defined.

The remove function returns zero if the operation succeeds, nonzero if it fails.

The rename functionEdit

 #include <stdio.h>
 int rename(const char *old_filename, const char *new_filename);

The rename function causes the file whose name is the string pointed to by old_filename to be henceforth known by the name given by the string pointed to by new_filename. The file named old_filename is no longer accessible by that name. If a file named by the string pointed to by new_filename exists prior to the call to the rename function, the behavior is implementation-defined.

The rename function returns zero if the operation succeeds, nonzero if it fails, in which case if the file existed previously it is still known by its original name.

The tmpfile functionEdit

 #include <stdio.h>
 FILE *tmpfile(void);

The tmpfile function creates a temporary binary file that will automatically be removed when it is closed or at program termination. If the program terminates abnormally, whether an open temporary file is removed is implementation-defined. The file is opened for update with "wb+" mode.

The tmpfile function returns a pointer to the stream of the file that it created. If the file cannot be created, the tmpfile function returns a null pointer.

The tmpnam functionEdit

 #include <stdio.h>
 char *tmpnam(char *s);

The tmpnam function generates a string that is a valid file name and that is not the name of an existing file.

The tmpnam function generates a different string each time it is called, up to TMP_MAX times. (TMP_MAX is a macro defined in stdio.h.) If it is called more than TMP_MAX times, the behavior is implementation-defined.

The implementation shall behave as if no library function calls the tmpnam function.

If the argument is a null pointer, the tmpnam function leaves its result in an internal static object and returns a pointer to that object. Subsequent calls to the tmpnam function may modify the same object. If the argument is not a null pointer, it is assumed to point to an array of at least L_tmpnam characters (L_tmpnam is another macro in stdio.h); the tmpnam function writes its result in that array and returns the argument as its value.

The value of the macro TMP_MAX must be at least 25.

Reading from FilesEdit

Character Input FunctionsEdit

The fgetc functionEdit

 #include <stdio.h>
 int fgetc(FILE *stream);

The fgetc function obtains the next character (if present) as an unsigned char converted to an int, from the input stream pointed to by stream, and advances the associated file position indicator for the stream (if defined).

The fgetc function returns the next character from the input stream pointed to by stream. If the stream is at end-of-file, the end-of-file indicator for the stream is set and fgetc returns EOF (EOF is a negative value defined in <stdio.h>, usually (-1)). If a read error occurs, the error indicator for the stream is set and fgetc returns EOF.

The fgets functionEdit

#include <stdio.h>
char *fgets(char *s, int n, FILE *stream);

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

Warning: Different operating systems may use different character sequences to represent the end-of-line sequence. For example, some filesystems use the terminator \r\n in text files; fgets may read those lines, removing the \n but keeping the \r as the last character of s. This expurious character should be removed in the string s before the string is used for anything (unless the programmer doesn't care about it). Unixes typically use \n as its end-of-line sequence, MS-DOS and Windows uses \r\n, and Mac OSes used \r before OS X.

 /* A example program that reads from stdin and writes to stdout */
 #include <stdio.h>
 
 #define BUFFER_SIZE 100
 
 int main(void)
 {
     char buffer[BUFFER_SIZE]; /* a read buffer */
     while( fgets (buffer, BUFFER_SIZE, stdin) != NULL)
     {
          printf("%s",buffer);
     }
     return 0;
 }
 /* end program. */

The getc functionEdit

#include <stdio.h>
int getc(FILE *stream);

The getc function is equivalent to fgetc, except that it may be implemented as a macro. If it is implemented as a macro, the stream argument may be evaluated more than once, so the argument should never be an expression with side effects (i.e. have an assignment, increment, or decrement operators, or be a function call).

The getc function returns the next character from the input stream pointed to by stream. If the stream is at end-of-file, the end-of-file indicator for the stream is set and getc returns EOF (EOF is a negative value defined in <stdio.h>, usually (-1)). If a read error occurs, the error indicator for the stream is set and getc returns EOF.

The getchar functionEdit

#include <stdio.h>
int getchar(void);

The getchar function is equivalent to getc with the argument stdin.

The getchar function returns the next character from the input stream pointed to by stdin. If stdin is at end-of-file, the end-of-file indicator for stdin is set and getchar returns EOF (EOF is a negative value defined in <stdio.h>, usually (-1)). If a read error occurs, the error indicator for stdin is set and getchar returns EOF.

The gets functionEdit

#include <stdio.h>
char *gets(char *s);

The gets function reads characters from the input stream pointed to by stdin into the array pointed to by s until an end-of-file is encountered or a new-line character is read. Any new-line character is discarded, and a null character is written immediately after the last character read into the array.

The gets function returns s if successful. If the end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

This function and description is only included here for completeness. Most C programmers nowadays shy away from using gets, as there is no way for the function to know how big the buffer is that the programmer wants to read into. Commandment #5 of Henry Spencer's The Ten Commandments for C Programmers (Annotated Edition) reads, "Thou shalt check the array bounds of all strings (indeed, all arrays), for surely where thou typest foo someone someday shall type supercalifragilisticexpialidocious." It mentions gets in the annotation: "As demonstrated by the deeds of the Great Worm, a consequence of this commandment is that robust production software should never make use of gets(), for it is truly a tool of the Devil. Thy interfaces should always inform thy servants of the bounds of thy arrays, and servants who spurn such advice or quietly fail to follow it should be dispatched forthwith to the Land Of Rm, where they can do no further harm to thee."

The ungetc functionEdit

#include <stdio.h>
int ungetc(int c, FILE *stream);

The ungetc function pushes the character specified by c (converted to an unsigned char) back onto the input stream pointed to by stream. The pushed-back characters will be returned by subsequent reads on that stream in the reverse order of their pushing. A successful intervening call (with the stream pointed to by stream) to a file-positioning function (fseek, fsetpos, or rewind) discards any pushed-back characters for the stream. The external storage corresponding to the stream is unchanged.

One character of pushback is guaranteed. If the ungetc function is called too many times on the same stream without an intervening read or file positioning operation on that stream, the operation may fail.

If the value of c equals that of the macro EOF, the operation fails and the input stream is unchanged.

A successful call to the ungetc function clears the end-of-file indicator for the stream. The value of the file position indicator for the stream after reading or discarding all pushed-back characters shall be the same as it was before the characters were pushed back. For a text stream, the value of its file-position indicator after a successful call to the ungetc function is unspecified until all pushed-back characters are read or discarded. For a binary stream, its file position indicator is decremented by each successful call to the ungetc function; if its value was zero before a call, it is indeterminate after the call.

The ungetc function returns the character pushed back after conversion, or EOF if the operation fails.


EOF pitfallEdit

A mistake when using fgetc, getc, or getchar is to assign the result to a variable of type char before comparing it to EOF. The following code fragments exhibit this mistake, and then show the correct approach (using type int):

Mistake Correction
char c;
while ((c = getchar()) != EOF)
    putchar(c);
int c;
while ((c = getchar()) != EOF)
    putchar(c);

Consider a system in which the type char is 8 bits wide, representing 256 different values. getchar may return any of the 256 possible characters, and it also may return EOF to indicate end-of-file, for a total of 257 different possible return values.

When getchar's result is assigned to a char, which can represent only 256 different values, there is necessarily some loss of information—when packing 257 items into 256 slots, there must be a collision. The EOF value, when converted to char, becomes indistinguishable from whichever one of the 256 characters shares its numerical value. If that character is found in the file, the above example may mistake it for an end-of-file indicator; or, just as bad, if type char is unsigned, then because EOF is negative, it can never be equal to any unsigned char, so the above example will not terminate at end-of-file. It will loop forever, repeatedly printing the character which results from converting EOF to char.

However, this looping failure mode does not occur if the char definition is signed (C makes the signedness of the default char type implementation-dependent),[1] assuming the commonly used EOF value of -1. However, the fundamental issue remains that if the EOF value is defined outside of the range of the char type, when assigned to a char that value is sliced and will no longer match the full EOF value necessary to exit the loop. On the other hand, if EOF is within range of char, this guarantees a collision between EOF and a char value. Thus, regardless of how system types are defined, never use char types when testing against EOF.

On systems where int and char are the same size (i.e., systems incompatible with minimally the POSIX and C99 standards), even the "good" example will suffer from the indistinguishability of EOF and some character's value. The proper way to handle this situation is to check feof and ferror after getchar returns EOF. If feof indicates that end-of-file has not been reached, and ferror indicates that no errors have occurred, then the EOF returned by getchar can be assumed to represent an actual character. These extra checks are rarely done, because most programmers assume that their code will never need to run on one of these "big char" systems. Another way is to use a compile-time assertion to make sure that UINT_MAX > UCHAR_MAX, which at least prevents a program with such an assumption from compiling in such a system.


Direct input function: the fread functionEdit

#include <stdio.h>
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.

The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered. If size or nmemb is zero, fread returns zero and the contents of the array and the state of the stream remain unchanged.

Formatted input functions: the scanf family of functionsEdit

#include <stdio.h>
int fscanf(FILE *stream, const char *format, ...);
int scanf(const char *format, ...);
int sscanf(const char *s, const char *format, ...);

The fscanf function reads input from the stream pointed to by stream, under control of the string pointed to by format that specifies the admissible sequences and how they are to be converted for assignment, using subsequent arguments as pointers to the objects to receive converted input. If there are insufficient arguments for the format, the behavior is undefined. If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored.

The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format is composed of zero or more directives: one or more white-space characters; an ordinary multibyte character (neither % or a white-space character); or a conversion specification. Each conversion specification is introduced by the character %. After the %, the following appear in sequence:

  • An optional assignment-suppressing character *.
  • An optional nonzero decimal integer that specifies the maximum field width.
  • An optional h, l (ell) or L indicating the size of the receiving object. The conversion specifiers d, i, and n shall be preceded by h if the corresponding argument is a pointer to short int rather than a pointer to int, or by l if it is a pointer to long int. Similarly, the conversion specifiers o, u, and x shall be preceded by h if the corresponding argument is a pointer to unsigned short int rather than unsigned int, or by l if it is a pointer to unsigned long int. Finally, the conversion specifiers e, f, and g shall be preceded by l if the corresponding argument is a pointer to double rather than a pointer to float, or by L if it is a pointer to long double. If an h, l, or L appears with any other format specifier, the behavior is undefined.
  • A character that specifies the type of conversion to be applied. The valid conversion specifiers are described below.

The fscanf function executes each directive of the format in turn. If a directive fails, as detailed below, the fscanf function returns. Failures are described as input failures (due to the unavailability of input characters) or matching failures (due to inappropriate input).

A directive composed of white-space character(s) is executed by reading input up to the first non-white-space character (which remains unread) or until no more characters remain unread.

A directive that is an ordinary multibyte character is executed by reading the next characters of the stream. If one of the characters differs from one comprising the directive, the directive fails, and the differing and subsequent characters remain unread.

A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:

Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier. (The white-space characters are not counted against the specified field width.)

An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest matching sequences of input characters, unless that exceeds a specified field width, in which case it is the initial subsequence of that length in the sequence. The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure, unless an error prevented input from the stream, in which case it is an input failure.

Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails; this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the space provided, the behavior is undefined.

The following conversion specifiers are valid:

d 
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtol function with the value 10 for the base argument. The corresponding argument shall be a pointer to integer.
i 
Matches an optionally signed integer, whose format is the same as expected for the subject sequence of the strtol function with the value 0 for the base argument. The corresponding argument shall be a pointer to integer.
o 
Matches an optionally signed octal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 8 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
u 
Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
x 
Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be a pointer to unsigned integer.
e, f, g 
Matches an optionally signed floating-point number, whose format is the same as expected for the subject string of the strtod function. The corresponding argument will be a pointer to floating.
s 
Matches a sequence of non-white-space characters. (No special provisions are made for multibyte characters.) The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically.
[ 
Matches a nonempty sequence of characters (no special provisions are made for multibyte characters) from a set of expected characters (the scanset). The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence and a terminating null character, which will be added automatically. The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) comprise the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all the characters that do not appear in the scanlist between the circumflex and the right bracket. If the conversion specifier begins with [] or [^], the right-bracket character is in the scanlist and the next right bracket character is the matching right bracket that ends the specification; otherwise, the first right bracket character is the one that ends the specification. If a - character is in the scanlist and is not the first, nor the second where the first character is a ^, nor the last character, the behavior is implementation-defined.
c 
Matches a sequence of characters (no special provisions are made for multibyte characters) of the number specified by the field width (1 if no field width is present in the directive). The corresponding argument shall be a pointer to the initial character of an array large enough to accept the sequence. No null character is added.
p 
Matches an implementation-defined set of sequences, which should be the same as the set of sequences that may be produced by the %p conversion of the fprintf function. The corresponding argument shall be a pointer to void. The interpretation of the input then is implementation-defined. If the input item is a value converted earlier during the same program execution, the pointer that results shall compare equal to that value; otherwise the behavior of the %p conversion is undefined.
n 
No input is consumed. The corresponding argument shall be a pointer to integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function. Execution of a %n directive does not increment the assignment count returned at the completion of execution of the fscanf function.
% 
Matches a single %; no conversion or assignment occurs. The complete conversion specification shall be %%.

If a conversion specification is invalid, the behavior is undefined.

The conversion specifiers E, G, and X are also valid and behave the same as, respectively, e, g, and x.

If end-of-file is encountered during input, conversion is terminated. If end-of-file occurs before any characters matching the current directive have been read (other than leading white space, where permitted), execution of the current directive terminates with an input failure; otherwise, unless execution of the current directive is terminated with a matching failure, execution of the following directive (if any) is terminated with an input failure.

If conversion terminates on a conflicting input character, the offending input character is left unread in the input stream. Trailing white space (including new-line characters) is left unread unless matched by a directive. The success of literal matches and suppressed assignments is not directly determinable other than via the %n directive.

The fscanf function returns the value of the macro EOF if an input failure occurs before any conversion. Otherwise, the fscanf function returns the number of input items assigned, which can be fewer than provided for, or even zero, in the event of an early matching failure.

The scanf function is equivalent to fscanf with the argument stdin interposed before the arguments to scanf. Its return value is similar to that of fscanf.

The sscanf function is equivalent to fscanf, except that the argument s specifies a string from which the input is to be obtained, rather than from a stream. Reaching the end of the string is equivalent to encountering the end-of-file for the fscanf function. If copying takes place between objects that overlap, the behavior is undefined.

Writing to FilesEdit

Character Output FunctionsEdit

The fputc functionEdit

#include <stdio.h>
int fputc(int c, FILE *stream);

The fputc function writes the character specified by c (converted to an unsigned char) to the stream pointed to by stream at the position indicated by the associated file position indicator (if defined), and advances the indicator appropriately. If the file cannot support positioning requests, or if the stream is opened with append mode, the character is appended to the output stream. The function returns the character written, unless a write error occurs, in which case the error indicator for the stream is set and fputc returns EOF.

The fputs functionEdit

#include <stdio.h>
int fputs(const char *s, FILE *stream);

The fputs function writes the string pointed to by s to the stream pointed to by stream. The terminating null character is not written. The function returns EOF if a write error occurs, otherwise it returns a nonnegative value.

The putc functionEdit

#include <stdio.h>
int putc(int c, FILE *stream);

The putc function is equivalent to fputc, except that if it is implemented as a macro, it may evaluate stream more than once, so the argument should never be an expression with side effects. The function returns the character written, unless a write error occurs, in which case the error indicator for the stream is set and the function returns EOF.

The putchar functionEdit

#include <stdio.h>
int putchar(int c);

The putchar function is equivalent to putc with the second argument stdout. It returns the character written, unless a write error occurs, in which case the error indicator for stdout is set and the function returns EOF.

The puts functionEdit

#include <stdio.h>
int puts(const char *s);

The puts function writes the string pointed to by s to the stream pointed to by stdout, and appends a new-line character to the output. The terminating null character is not written. The function returns EOF if a write error occurs; otherwise, it returns a nonnegative value.

Direct output function: the fwrite functionEdit

#include <stdio.h>
size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream);

The fwrite function writes, from the array pointed to by ptr, up to nmemb elements whose size is specified by size to the stream pointed to by stream. The file position indicator for the stream (if defined) is advanced by the number of characters successfully written. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. The function returns the number of elements successfully written, which will be less than nmemb only if a write error is encountered.

Formatted output functions: the printf family of functionsEdit

#include <stdarg.h>
#include <stdio.h>
int fprintf(FILE *stream, const char *format, ...);
int printf(const char *format, ...);
int sprintf(char *s, const char *format, ...);
int vfprintf(FILE *stream, const char *format, va_list arg);
int vprintf(const char *format, va_list arg);
int vsprintf(char *s, const char *format, va_list arg);

Note: Some length specifiers and format specifiers are new in C99. These may not be available in older compilers and versions of the stdio library, which adhere to the C89/C90 standard. Wherever possible, the new ones will be marked with (C99).

The fprintf function writes output to the stream pointed to by stream under control of the string pointed to by format that specifies how subsequent arguments are converted for output. If there are insufficient arguments for the format, the behavior is undefined. If the format is exhausted while arguments remain, the excess arguments are evaluated (as always) but are otherwise ignored. The fprintf function returns when the end of the format string is encountered.

The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format is composed of zero or more directives: ordinary multibyte characters (not %), which are copied unchanged to the output stream; and conversion specifications, each of which results in fetching zero or more subsequent arguments, converting them, if applicable, according to the corresponding conversion specifier, and then writing the result to the output stream.

Each conversion specification is introduced by the character %. After the %, the following appear in sequence:

  • Zero or more flags (in any order) that modify the meaning of the conversion specification.
  • An optional minimum field width. If the converted value has fewer characters than the field width, it is padded with spaces (by default) on the left (or right, if the left adjustment flag, described later, has been given) to the field width. The field width takes the form of an asterisk * (described later) or a decimal integer. (Note that 0 is taken as a flag, not as the beginning of a field width.)
  • An optional precision that gives the minimum number of digits to appear for the d, i, o, u, x, and X conversions, the number of digits to appear after the decimal-point character for a, A, e, E, f, and F conversions, the maximum number of significant digits for the g and G conversions, or the maximum number of characters to be written from a string in s conversions. The precision takes the form of a period (.) followed either by an asterisk * (described later) or by an optional decimal integer; if only the period is specified, the precision is taken as zero. If a precision appears with any other conversion specifier, the behavior is undefined. Floating-point numbers are rounded to fit the precision; i.e. printf("%1.1f\n", 1.19); produces 1.2.
  • An optional length modifier that specifies the size of the argument.
  • A conversion specifier character that specifies the type of conversion to be applied.

As noted above, a field width, or precision, or both, may be indicated by an asterisk. In this case, an int argument supplies the field width or precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted. A negative field width argument is taken as a - flag followed by a positive field width. A negative precision argument is taken as if the precision were omitted.

The flag characters and their meanings are:

- 
The result of the conversion is left-justified within the field. (It is right-justified if this flag is not specified.)
+ 
The result of a signed conversion always begins with a plus or minus sign. (It begins with a sign only when a negative value is converted if this flag is not specified. The results of all floating conversions of a negative zero, and of negative values that round to zero, include a minus sign.)
space 
If the first character of a signed conversion is not a sign, or if a signed conversion results in no characters, a space is prefixed to the result. If the space and + flags both appear, the space flag is ignored.
# 
The result is converted to an "alternative form". For o conversion, it increases the precision, if and only if necessary, to force the first digit of the result to be a zero (if the value and precision are both 0, a single 0 is printed). For x (or X) conversion, a nonzero result has 0x (or 0X) prefixed to it. For a, A, e, E, f, F, g, and G conversions, the result always contains a decimal-point character, even if no digits follow it. (Normally, a decimal-point character appears in the result of these conversions only if a digit follows it.) For g and G conversions, trailing zeros are not removed from the result. For other conversions, the behavior is undefined.
0 
For d, i, o, u, x, X, a, A, e, E, f, F, g, and G conversions, leading zeros (following any indication of sign or base) are used to pad to the field width; no space padding is performed. If the 0 and - flags both appear, the 0 flag is ignored. For d, i, o, u, x, and X conversions, if a precision is specified, the 0 flag is ignored. For other conversions, the behavior is undefined.

The length modifiers and their meanings are:

hh 
(C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.
h 
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
l (ell) 
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; that a following n conversion specifier applies to a pointer to a long int argument; (C99) that a following c conversion specifier applies to a wint_t argument; (C99) that a following s conversion specifier applies to a pointer to a wchar_t argument; or has no effect on a following a, A, e, E, f, F, g, or G conversion specifier.
ll (ell-ell) 
(C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a long long int or unsigned long long int argument; or that a following n conversion specifier applies to a pointer to a long long int argument.
j 
(C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to an intmax_t or uintmax_t argument; or that a following n conversion specifier applies to a pointer to an intmax_t argument.
z 
(C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a size_t or the corresponding signed integer type argument; or that a following n conversion specifier applies to a pointer to a signed integer type corresponding to size_t argument.
t 
(C99) Specifies that a following d, i, o, u, x, or X conversion specifier applies to a ptrdiff_t or the corresponding unsigned integer type argument; or that a following n conversion specifier applies to a pointer to a ptrdiff_t argument.
L 
Specifies that a following a, A, e, E, f, F, g, or G conversion specifier applies to a long double argument.

If a length modifier appears with any conversion specifier other than as specified above, the behavior is undefined.

The conversion specifiers and their meanings are:

d, i 
The int argument is converted to signed decimal in the style []dddd. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it is expanded with leading zeros. The default precision is 1. The result of converting a zero value with a precision of zero is no characters.
o, u, x, X 
The unsigned int argument is converted to unsigned octal (o), unsigned decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; the letters abcdef are used for x conversion and the letters ABCDEF for X conversion. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it is expanded with leading zeros. The default precision is 1. The result of converting a zero value with a precision of zero is no characters.
f, F 
A double argument representing a (finite) floating-point number is converted to decimal notation in the style []ddd.ddd, where the number of digits after the decimal-point character is equal to the precision specification. If the precision is missing, it is taken as 6; if the precision is zero and the # flag is not specified, no decimal-point character appears. If a decimal-point character appears, at least one digit appears before it. The value is rounded to the appropriate number of digits.
(C99) A double argument representing an infinity is converted in one of the styles [-]inf or [-]infinity — which style is implementation-defined. A double argument representing a NaN is converted in one of the styles [-]nan or [-]nan(n-char-sequence) — which style, and the meaning of any n-char-sequence, is implementation-defined. The F conversion specifier produces INF, INFINITY, or NAN instead of inf, infinity, or nan, respectively. (When applied to infinite and NaN values, the -, +, and space flags have their usual meaning; the # and 0 flags have no effect.)
e, E 
A double argument representing a (finite) floating-point number is converted in the style []d.ddddd, where there is one digit (which is nonzero if the argument is nonzero) before the decimal-point character and the number of digits after it is equal to the precision; if the precision is missing, it is taken as 6; if the precision is zero and the # flag is not specified, no decimal-point character appears. The value is rounded to the appropriate number of digits. The E conversion specifier produces a number with E instead of e introducing the exponent. The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent. If the value is zero, the exponent is zero.
(C99) A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
g, G 
A double argument representing a (finite) floating-point number is converted in style f or e (or in style F or E in the case of a G conversion specifier), with the precision specifying the number of significant digits. If the precision is zero, it is taken as 1. The style used depends on the value converted; style e (or E) is used only if the exponent resulting from such a conversion is less than –4 or greater than or equal to the precision. Trailing zeros are removed from the fractional portion of the result unless the # flag is specified; a decimal-point character appears only if it is followed by a digit.
(C99) A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
a, A 
(C99) A double argument representing a (finite) floating-point number is converted in the style []0xh.hhhhd, where there is one hexadecimal digit (which is nonzero if the argument is a normalized floating-point number and is otherwise unspecified) before the decimal-point character (Binary implementations can choose the hexadecimal digit to the left of the decimal-point character so that subsequent digits align to nibble [4-bit] boundaries.) and the number of hexadecimal digits after it is equal to the precision; if the precision is missing and FLT_RADIX is a power of 2, then the precision is sufficient for an exact representation of the value; if the precision is missing and FLT_RADIX is not a power of 2, then the precision is sufficient to distinguish (The precision p is sufficient to distinguish values of the source type if 16p–1 > bn where b is FLT_RADIX and n is the number of base-b digits in the significand of the source type. A smaller p might suffice depending on the implementation's scheme for determining the digit to the left of the decimal-point character.) values of type double, except that trailing zeros may be omitted; if the precision is zero and the # flag is not specified, no decimal-point character appears. The letters abcdef are used for a conversion and the letters ABCDEF for A conversion. The A conversion specifier produces a number with X and P instead of x and p. The exponent always contains at least one digit, and only as many more digits as necessary to represent the decimal exponent of 2. If the value is zero, the exponent is zero.
A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.
c 
If no l length modifier is present, the int argument is converted to an unsigned char, and the resulting character is written.
(C99) If an l length modifier is present, the wint_t argument is converted as if by an ls conversion specification with no precision and an argument that points to the initial element of a two-element array of wchar_t, the first element containing the wint_t argument to the lc conversion specification and the second a null wide character.
s 
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type. (No special provisions are made for multibyte characters.) Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many characters are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.
(C99) If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are converted to multibyte characters (each as if by a call to the wcrtomb function, with the conversion state described by an mbstate_t object initialized to zero before the first wide character is converted) up to and including a terminating null wide character. The resulting multibyte characters are written up to (but not including) the terminating null character (byte). If no precision is specified, the array shall contain a null wide character. If a precision is specified, no more than that many characters (bytes) are written (including shift sequences, if any), and the array shall contain a null wide character if, to equal the multibyte character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case is a partial multibyte character written. (Redundant shift sequences may result if multibyte characters have a state-dependent encoding.)
p 
The argument shall be a pointer to void. The value of the pointer is converted to a sequence of printable characters, in an implementation-defined manner.
n 
The argument shall be a pointer to signed integer into which is written the number of characters written to the output stream so far by this call to fprintf. No argument is converted, but one is consumed. If the conversion specification includes any flags, a field width, or a precision, the behavior is undefined.
% 
A % character is written. No argument is converted. The complete conversion specification shall be %%.

If a conversion specification is invalid, the behavior is undefined. If any argument is not the correct type for the corresponding coversion specification, the behavior is undefined.

In no case does a nonexistent or small field width cause truncation of a field; if the result of a conversion is wider than the field width, the field is expanded to contain the conversion result.

For a and A conversions, if FLT_RADIX is a power of 2, the value is correctly rounded to a hexadecimal floating number with the given precision.

It is recommended practice that if FLT_RADIX is not a power of 2, the result should be one of the two adjacent numbers in hexadecimal floating style with the given precision, with the extra stipulation that the error should have a correct sign for the current rounding direction.

It is recommended practice that for e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most DECIMAL_DIG, then the result should be correctly rounded. (For binary-to-decimal conversion, the result format's values are the numbers representable with the given format specifier. The number of significant digits is determined by the format specifier, and in the case of fixed-point conversion by the source value as well.) If the number of significant decimal digits is more than DECIMAL_DIG but the source value is exactly representable with DECIMAL_DIG digits, then the result should be an exact representation with trailing zeros. Otherwise, the source value is bounded by two adjacent decimal strings L < U, both having DECIMAL_DIG significant digits; the value of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that the error should have a correct sign for the current rounding direction.

The fprintf function returns the number of characters transmitted, or a negative value if an output or encoding error occurred.

The printf function is equivalent to fprintf with the argument stdout interposed before the arguments to printf. It returns the number of characters transmitted, or a negative value if an output error occurred.

The sprintf function is equivalent to fprintf, except that the argument s specifies an array into which the generated input is to be written, rather than to a stream. A null character is written at the end of the characters written; it is not counted as part of the returned sum. If copying takes place between objects that overlap, the behavior is undefined. The function returns the number of characters written in the array, not counting the terminating null character.

The vfprintf function is equivalent to fprintf, with the variable argument list replaced by arg, which shall have been initialized by the va_start macro (and possibly subsequent va_arg calls). The vfprintf function does not invoke the va_end macro. The function returns the number of characters transmitted, or a negative value if an output error occurred.

The vprintf function is equivalent to printf, with the variable argument list replaced by arg, which shall have been initialized by the va_start macro (and possibly subsequent va_arg calls). The vprintf function does not invoke the va_end macro. The function returns the number of characters transmitted, or a negative value if an output error occurred.

The vsprintf function is equivalent to sprintf, with the variable argument list replaced by arg, which shall have been initialized by the va_start macro (and possibly subsequent va_arg calls). The vsprintf function does not invoke the va_end macro. If copying takes place between objects that overlap, the behavior is undefined. The function returns the number of characters written into the array, not counting the terminating null character.


ReferencesEdit

  1. C99 §6.2.5/15
Previous: Standard libraries Index Next: Beginning exercises