C Programming/stdio.h/gets
gets
is a function in the C standard library, declared in the header file stdio.h
, that reads a line from the standard input and stores it in a buffer provided by the caller.
Use of gets
is strongly discouraged. It is left in the C89 and C99 standards for backward compatibility (but officially deprecated in late revisions of C99). It is removed from the C11 standard[1] and instead a range checking alternative gets_s
is introduced.[2] Many development tools such as GNU ld emit warnings when code using gets
is linked.
Implementation
editIt might be implemented as follows (using getchar
):
char *
gets (char *s)
{
char * ch = s;
int k;
/* until we read a newline */
while ((k = getchar ()) != '\n') {
if (k == EOF) {
/* EOF at start of line or errors other than EOF return NULL */
if (ch == s || !feof(stdin))
return NULL;
break;
}
/* character is stored at address, and pointer is incremented */
*ch++ = k;
}
/* Null-terminating character added */
*ch = '\0';
/* return original pointer */
return s;
}
The programmer must know a maximum limit for the number of characters gets
will read so he can ensure the buffer is big enough. This is impossible without knowledge of the data. This design flaw leads to bugs and opens a gate for exploiting computer security through a buffer overflow. Many sources advise programmers to never use gets
in new programs.[3][4][5]
Alternatives
editOther line input functions may be used instead of gets
, so as to avoid buffer overflow bugs. A simple alternative is fgets
. When replacing code of the form
char buffer[BUFFERSIZE];
gets(buffer);
with code of the form
char buffer[BUFFERSIZE];
fgets(buffer, sizeof(buffer), stdin);
one must keep in mind that the fgets(buffer, sizeof(buffer), stdin)
call differs from gets(buffer)
not only in buffer overflow protection, but also in that fgets(buffer, sizeof(buffer), stdin)
preserves the terminating newline (if the input line is terminated by a newline), while gets(buffer)
discards it.
The first edition of The C Programming Language did not use gets
but instead described a much safer function getline(buffer, length)
, which would not overflow the buffer and would return the useful information of how many bytes were read (which would allow NUL to be typed) or -1 on error or EOF. It is unclear why gets ended up in the C standard library rather than this function.
POSIX-2008 defines getline(char **buffer, size_t *buffersize, FILE*)
that reallocates the buffer as needed to hold the input line (note the extra level of indirection on the buffer and size).[6]
The C1X proposal has a replacement function gets_s(char* buffer, size_t n)
that returns an empty string and consumes the whole current line if the line does not fit in n-1
characters.
Safe use
editSafe use of gets
requires the programmer to ensure that buffer overflows cannot be a problem. The only portable way is to somehow make sure the input file cannot contain lines longer than the buffer, such as by ensuring that the file was created by a program that cannot write such lines. There are a number of other relatively complicated ways to protect from buffer overflows, with varying degrees of portability. One possibility is to use a guard page to protect memory. Alone, this turns exploitable buffer overflows into mere crashes. In combination with an exception handler, such as one involving SIGSEGV
and sigaction
, the guard page can allow graceful error handling.
References
edit- ↑ n1548, p.xiv
- ↑ n1548, K.3.5.4.1
- ↑ GNU. "Line Input". The GNU C Library. GNU. http://www.gnu.org/software/libc/manual/html_node/Line-Input.html#Line-Input. Retrieved 2008-08-02. "The
gets
function is very dangerous because it provides no protection against overflowing the strings
. The GNU library includes it for compatibility only. You should always usefgets
orgetline
instead." (Emphasis in original.) - ↑ "Why does everyone say not to use
gets()
?". comp.lang.c Frequently Asked Questions. Retrieved 2008-08-02. - ↑ "
gets(3)
".man
. http://linux.die.net/man/3/gets. Retrieved 2008-08-02. "Never usegets()
. Because it is impossible to tell without knowing the data in advance how many charactersgets()
will read, and becausegets()
will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security." - ↑ "getdelim". The Open Group.