C Programming/Particularities of C
C is an efficient, minimalist language that has some peculiarities that a programmer must be aware of. To address these, sometimes a good solution is to combine another language with C for added flexibility and power, like the combination of Emacs-LISP and C used for Emacs. Sometimes they can be addressed at the cost of slower speed and increased complexity by using special constructs that will guarantee function and security. Mostly however, through practice, C programmers have no trouble with the things mentioned here, and prefer using a language that closely models the general purpose, Von Neumann hardware architecture.
Below are several of these particularities of ANSI C (that sometimes are also its strengths), some minor and some major:
- Lack of differentiation between arrays and pointers
- The very first C (around 1973) did not have arrays at all; modern implementations are contiguous areas in memory accessed with pointer arithmetic (note: a declared array cannot be assigned to like a pointer), which circumvents the need to declare arrays with a fixed size. This ability, however, can cause buffer overflow errors with careless use.
- Arrays do not store their length
- A consequence of the above feature. This means that the program might need to explicitly perform a bounds check before accessing an array. Unless a function is passed an array of a fixed size, there is no way for it to discover the length of the array it was given: So the function must be given the length, perhaps passed to the function as a separate variable or in a structure. Because of this, most implementations do not provide automatic array bounds checking, and manual bounds checking is error prone.
- If a C (or C++) program attempts to access an array element outside of the actual allocated memory, then a buffer overflow occurs, typically crashing the program. Buffer overflow bugs are a common security vulnerability too. Many other computer languages provide automatic bounds checking, and so they are nearly immune to such bugs. [1][2][3][4][5]
- Variable Length Arrays
- A VLA ‒ variable length array ‒ can only be used for function parameters and auto variables. VLAs cannot be used inside a structure (except as the last item in the structure). It's not possible to define a structure that corresponds to the standard Forth dictionary definition (which has 2 variable-length parts), except as an undifferentiated array of
char
.
- Arbitrary-size built-in 2D or 3D arrays are not widely supported
- This feature has been added starting with the C99 specification for variable-length arrays, although many C compilers still do not support it. Without VLAs, there is no way for a function to accept 2D or 3D arrays of arbitrary size. In particular, it's impossible to define a function that accepts
int a[5][4][3];
on one call, and later acceptsint b[10][10][10];
in a later call. Instead of using the built-in 2D or 3D array data type, C programmers use some other data type to hold (mathematical) 2D or 3D arrays of arbitrary size (multi-dimensional arrays) -- see C Programming/Common practices#Dynamic multidimensional arrays for details.
- No formal String data type
- Strings are character arrays (lacking any abstraction) and inherit all their constraints (structs can provide an abstraction, to an extent).
- Weak type safety
- C is not very type-safe. The memory management functions operate on untyped pointers, there is no built-in run-time type enforcement, and the type system can be circumvented with pointers and casts. Additionally, typedef does not create a new type but only an alias, thus it serves solely for code legibility. However, it is possible to use single member structs to enforce type safety.
- No garbage collection
- As a low-level language designed for minimum overhead, C features only manual memory management, which can allow simple memory leaks to go on unchecked.
- Local variables are uninitialized upon declaration
- Local (but not global) variables must be initialized manually; before this, they contain whatever was already in memory at the time. This is not unusual, but the C standard does not forbid access to uninitialized variables (which is).
- Unwieldy function pointer syntax
- Function pointers take the form of
[return type] [name]([arg1 type])([arg2 type])
, making them somewhat difficult to use. Typedefs can alleviate this burdensome syntax. For example,typedef int fn(int i);
. See C Programming/Pointers and arrays#Pointers to Functions for more details.
- No reflection
- It is not possible for a C program -- at runtime -- to evaluate a string as if it were a source C code statement.
- Nested functions are not standard
- However, many C compilers do support nested functions, including GNU C.[6]
- No formal exception handling
- Some standard functions return special values that must be handled manually. For example,
malloc()
returns null upon failure. For example, one must store the return value ofgetchar()
in anint
(not, as one might expect, in achar
) in order to reliably detect the end-of-file -- see EOF pitfall. Programs that do not include appropriate error handling might work fine most of the time, but can crash or otherwise malfunction when exceptional cases occur. POSIX systems often usesignal()
to handle some kinds of exceptions. (See C Programming/Error handling#Signals for details). Some programs usesetjmp()
,longjmp()
orgoto
to manually handle some kinds of exceptions. (See C Programming/Control#One last thing: goto and C Programming/Coroutines for details).
- No anonymous function definitions
References
edit- ↑ http://projects.webappsec.org/Buffer-Overflow
- ↑ http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/buffer-overflow.html
- ↑ http://searchsecurity.techtarget.com/news/article/0,289142,sid14_gci860185,00.html
- ↑ http://www.owasp.org/index.php/Buffer_Overflows
- ↑ http://cyclone.thelanguage.org/wiki/Why%20Cyclone
- ↑ "A GNU Manual": "Extensions to the C Language: Nested Functions" [1]