C Programming/string.h/strcpy

The C programming language offers a library function called strcpy, defined in the string.h header file, that allows null-terminated memory blocks to be copied from one location to another. Since strings in C are not first-class data types and are implemented instead as contiguous blocks of bytes in memory, strcpy will effectively copy strings given two pointers to blocks of allocated memory.

The prototype of the function is:[1]

char *strcpy(char *destination, const char *source);

The argument order mimics that of an assignment: destination "=" source. The return value is destination.

Usage and implementation edit

For example

char *str1 = "abcdefghijklmnop";
char *str2 = malloc(100); /* must be large enough to hold the string! */
strcpy(str2, str1); /* str2 is now "abcdefghijklmnop" */
str2[0] = 'A'; /* str2 is now "Abcdefghijklmnop" */
/* str1 is still "abcdefghijklmnop" */

In the second line memory is allocated to hold the copy of the string, then the string is copied from one memory block into the other, then the first letter of that copy is modified.

Although the simple assignment str2 = str1 might appear to do the same thing, it only copies the memory address of str1 into str2 but not the actual string. Both str1 and str2 would refer to the same memory block, and the allocated block that used to be pointed to by str2 would be lost. The assignment to str2[0] would either also modify str1, or it would cause an access violation (as modern compilers often place the string constants in read-only memory).

The strcpy function performs a copy by iterating over the individual characters of the string and copying them one by one. An explicit implementation of strcpy is:

char *strcpy(char *dest, const char *src)
{
  unsigned i;
  for (i=0; src[i] != '\0'; ++i)
    dest[i] = src[i];

  //Ensure trailing null byte is copied
  dest[i]= '\0';

  return dest;
}

A common compact implementation is:

char *strcpy(char *dest, const char *src)
{
   char *save = dest;
   while(*dest++ = *src++);
   return save;
}

Modern versions provided by C libraries often copy far more than one byte at a time, relying on bit math to detect if the larger word has a null byte before writing it. Often a call compiles into an inline machine instruction specifically designed to do strcpy.

Unicode edit

strcpy will work for all common byte encodings of Unicode strings, including UTF-8. There is no need to actually know the encoding as long as the null byte is never used by it.

If Unicode is encoded in units larger than a byte, such as UTF-16, then a different function is needed, as null bytes will occur in parts of the larger code units. C99 defines the function wcscpy(), which will copy wchar_t-sized objects and stop at the first one with a zero value. This is not as useful as it appears, as different computer platforms disagree on how large a wchar_t is (some use 16 bits and some 32 bits).

Buffer overflows edit

strcpy can be dangerous because if the string to be copied is too long to fit in the destination buffer, it will overwrite adjacent memory, invoking undefined behavior. Usually the program will simply cause a segmentation fault when this occurs, but a skilled attacker can use buffer overflow to break into a system. To prevent buffer overflows, several alternatives for strcpy have been used. All of them take an extra argument which is the length of the destination buffer and will not write past that buffer end. All of them can still result in buffer overflows if an incorrect length is provided.

strncpy edit

char* strncpy(char* dst, const char* src, size_t size);

strncpy writes exactly the given number of bytes, either only copying the start of the string if it is too long, or adding zeros to the end of the copy to fill the buffer. It was introduced into the C library to deal with fixed-length name fields in structures such as directory entries. Despite its name it is not a bounded version of strcpy; it does not guarantee that the result is a null-terminated string. The name of the function is misleading because strncat and snprintf are respectively bounded versions of strcat and sprintf.

The assumption that the result is a null-terminated string leads to two problems. If the source string is too long, the result is not null-terminated, making data after the end of the buffer appear to be part of the string. And if the source string is much shorter than the buffer, considerable time will be wasted filling the rest of the buffer with null bytes.

An alternative from the standard C library that will always append one null byte is to use strncat with an initially empty string as the destination.


strlcpy edit

size_t strlcpy(char* dst, const char* src, size_t size);

The strlcpy function, created by OpenBSD developers Todd C. Miller and Theo de Raadt, is often regarded as a safer version of strncpy. It always adds a single null byte, and returns the number of bytes that would be needed, allowing the caller to reallocate the buffer if possible. It has been ported to a number of operating systems, but notably rejected by Ulrich Drepper, the glibc maintainer, who suggests that C programmers need to keep track of string length and that "using this function only leads to other errors."[2]

strcpy_s edit

errno_t strcpy_s(char* dst, rsize_t size, const char* src);

The strcpy_s function, proposed for standardisation in ISO/IEC TR 24731,[3][4] is supported by the Microsoft C Runtime Library[5] and some other C libraries. It returns non-zero if the source string does not fit, and sets the buffer to the empty string (not the prefix!). It is also explicitly unsupported by some libraries, including the GNU C library.[6] Warning messages produced by Microsoft's compilers suggesting programmers change strcpy and strncpy to this function have been speculated by some to be a Microsoft attempt to lock developers to its platform.[7][8]

References edit

  1. ISO/IEC 9899:1999 specification, p. 326, § 7.21.2.3
  2. libc-alpha mailing list, selected messages from 8 Aug 2000 thread: 53, 60, 61
  3. ISO/IEC. ISO/IEC WDTR 24731 Specification for Secure C Library Functions. International Organization for Standardization. Retrieved 2008-04-23.
  4. Plakosh, Daniel. "strcpy_s() and strcat_s()". Pearson Education, Inc. Retrieved 2006-08-12.
  5. Microsoft. "Security Enhancements in the CRT". MSDN. Retrieved 2008-09-16.
  6. "Re: Implementing "Extensions to the C Library" (ISO/IEC WG14 N1172)".
  7. Danny Kalev. "They're at it again". InformIT.
  8. "Security Enhanced CRT, Safer Than Standard Library?".

External links edit