C Programming/stdio.h/gets

gets is a function in the C standard library, declared in the header file stdio.h, that reads a line from the standard input and stores it in a buffer provided by the caller.

Use of gets is strongly discouraged. It is left in the C89 and C99 standards for backward compatibility (but officially deprecated in late revisions of C99). It is removed from the C11 standard[1] and instead a range checking alternative gets_s is introduced.[2] Many development tools such as GNU ld emit warnings when code using gets is linked.

Implementation

edit

It might be implemented as follows (using getchar):

char *
gets (char *s)
{
    char * ch = s;
    int k;
	
    /* until we read a newline */
    while ((k = getchar ()) != '\n') {

        if (k == EOF) {
            /* EOF at start of line or errors other than EOF return NULL */
            if (ch == s || !feof(stdin)) 
                return NULL;

            break;
        }
		
        /* character is stored at address, and pointer is incremented */
        *ch++ = k;
    }
		
    /* Null-terminating character added */
    *ch = '\0';
		
    /* return original pointer */
    return s; 
}

The programmer must know a maximum limit for the number of characters gets will read so he can ensure the buffer is big enough. This is impossible without knowledge of the data. This design flaw leads to bugs and opens a gate for exploiting computer security through a buffer overflow. Many sources advise programmers to never use gets in new programs.[3][4][5]

Alternatives

edit

Other line input functions may be used instead of gets, so as to avoid buffer overflow bugs. A simple alternative is fgets. When replacing code of the form

char buffer[BUFFERSIZE];
gets(buffer);

with code of the form

char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);

one must keep in mind that the fgets(buffer, sizeof(buffer), stdin) call differs from gets(buffer) not only in buffer overflow protection, but also in that fgets(buffer, sizeof(buffer), stdin) preserves the terminating newline (if the input line is terminated by a newline), while gets(buffer) discards it.

The first edition of The C Programming Language did not use gets but instead described a much safer function getline(buffer, length), which would not overflow the buffer and would return the useful information of how many bytes were read (which would allow NUL to be typed) or -1 on error or EOF. It is unclear why gets ended up in the C standard library rather than this function.

POSIX-2008 defines getline(char **buffer, size_t *buffersize, FILE*) that reallocates the buffer as needed to hold the input line (note the extra level of indirection on the buffer and size).[6]

The C1X proposal has a replacement function gets_s(char* buffer, size_t n) that returns an empty string and consumes the whole current line if the line does not fit in n-1 characters.

Safe use

edit

Safe use of gets requires the programmer to ensure that buffer overflows cannot be a problem. The only portable way is to somehow make sure the input file cannot contain lines longer than the buffer, such as by ensuring that the file was created by a program that cannot write such lines. There are a number of other relatively complicated ways to protect from buffer overflows, with varying degrees of portability. One possibility is to use a guard page to protect memory. Alone, this turns exploitable buffer overflows into mere crashes. In combination with an exception handler, such as one involving SIGSEGV and sigaction, the guard page can allow graceful error handling.

References

edit
  1. n1548, p.xiv
  2. n1548, K.3.5.4.1
  3. GNU. "Line Input". The GNU C Library. GNU. http://www.gnu.org/software/libc/manual/html_node/Line-Input.html#Line-Input. Retrieved 2008-08-02. "The gets function is very dangerous because it provides no protection against overflowing the string s. The GNU library includes it for compatibility only. You should always use fgets or getline instead."  (Emphasis in original.)
  4. "Why does everyone say not to use gets()?". comp.lang.c Frequently Asked Questions. Retrieved 2008-08-02.
  5. "gets(3)". man. http://linux.die.net/man/3/gets. Retrieved 2008-08-02. "Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security." 
  6. "getdelim". The Open Group.