Last modified on 17 April 2014, at 18:56

C Programming/Preprocessor

Previous: Error handling Index Next: Libraries

Preprocessors are a way of making text processing with your C program before they are actually compiled. Before the actual compilation of every C program it is passed through a Preprocessor. The Preprocessor looks through the program trying to find out specific instructions called Preprocessor directives that it can understand. All Preprocessor directives begin with the # (hash) symbol. C++ compilers use the same C preprocessor.[1]

The preprocessor is a part of the compiler which performs preliminary operations (conditionally compiling code, including files etc...) to your code before the compiler sees it. These transformations are lexical, meaning that the output of the preprocessor is still text.

NOTE: Technically the output of the preprocessing phase for C consists of a sequence of tokens, rather than source text, but it is simple to output source text which is equivalent to the given token sequence, and that is commonly supported by compilers via a -E or /E option -- although command line options to C compilers aren't completely standard, many follow similar rules.

DirectivesEdit

Directives are special instructions directed to the preprocessor (preprocessor directive) or to the compiler (compiler directive) on how it should process part or all of your source code or set some flags on the final object and are used to make writing source code easier (more portable for instance) and to make the source code more understandable. Directives are handled by the preprocessor, which is either a separate program invoked by the compiler or part of the compiler itself.

#includeEdit

C has some features as part of the language and some others as part of a standard library, which is a repository of code that is available alongside every standard-conformant C compiler. When the C compiler compiles your program it usually also links it with the standard C library. For example, on encountering a #include <stdio.h> directive, it replaces the directive with the contents of the stdio.h header file.

When you use features from the library, C requires you to declare what you would be using. The first line in the program is a preprocessing directive which should look like this:

#include <stdio.h>

The above line causes the C declarations which are in the stdio.h header to be included for use in your program. Usually this is implemented by just inserting into your program the contents of a header file called stdio.h, located in a system-dependent location. The location of such files may be described in your compiler's documentation. A list of standard C header files is listed below in the Headers table.

The stdio.h header contains various declarations for input/output (I/O) using an abstraction of I/O mechanisms called streams. For example there is an output stream object called stdout which is used to output text to the standard output, which usually displays the text on the computer screen.

If using angle brackets like the example above, the preprocessor is instructed to search for the include file along the development environment path for the standard includes.

#include "other.h"

If you use quotation marks (" "), the preprocessor is expected to search in some additional, usually user-defined, locations for the header file, and to fall back to the standard include paths only if it is not found in those additional locations. It is common for this form to include searching in the same directory as the file containing the #include directive.

NOTE: You should check the documentation of the development environment you are using for any vendor specific implementations of the #include directive.

HeadersEdit

The C90 standard headers list:

Headers added since C90:

#pragmaEdit

The pragma (pragmatic information) directive is part of the standard, but the meaning of any pragma depends on the software implementation of the standard that is used. The #pragma directive provides a way to request special behavior from the compiler. This directive is most useful for programs that are unusually large or that need to take advantage of the capabilities of a particular compiler.

Pragmas are used within the source program.

#pragma token(s)
  1. pragma is usually followed by a single token, which represents a command for the compiler to obey. You should check the software implementation of the C standard you intend on using for a list of the supported tokens. Not surprisingly, the set of commands that can appear in #pragma directives is different for each compiler; you'll have to consult the documentation for your compiler to see which commands it allows and what those commands do.

For instance one of the most implemented preprocessor directives, #pragma once when placed at the beginning of a header file, indicates that the file where it resides will be skipped if included several times by the preprocessor.

NOTE: Other methods exist to do this action that is commonly referred as using include guards.



#defineEdit

WARNING: Preprocessor macros, although tempting, can produce quite unexpected results if not done right. Always keep in mind that macros are textual substitutions done to your source code before anything is compiled. The compiler does not know anything about the macros and never gets to see them. This can produce obscure errors, amongst other negative effects. Prefer to use language features, if there are equivalent (In example use const int or enum instead of #defined constants).

That said, there are cases, where macros are very useful (see the debug macro below for an example).

The #define directive is used to define values or macros that are used by the preprocessor to manipulate the program source code before it is compiled. Because preprocessor definitions are substituted before the compiler acts on the source code, any errors that are introduced by #define are difficult to trace.

By convention, values defined using #define are named in uppercase. Although doing so is not a requirement, it is considered very bad practice to do otherwise. This allows the values to be easily identified when reading the source code.

Today, #define is primarily used to handle compiler and platform differences. E.g., a define might hold a constant which is the appropriate error code for a system call. The use of #define should thus be limited unless absolutely necessary; typedef statements and constant variables can often perform the same functions more safely.

Another feature of the #define command is that it can take arguments, making it rather useful as a pseudo-function creator. Consider the following code:

#define ABSOLUTE_VALUE( x ) ( ((x) < 0) ? -(x) : (x) )
...
int x = -1;
while( ABSOLUTE_VALUE( x ) ) {
...
}

It's generally a good idea to use extra parentheses when using complex macros. Notice that in the above example, the variable "x" is always within its own set of parentheses. This way, it will be evaluated in whole, before being compared to 0 or multiplied by -1. Also, the entire macro is surrounded by parentheses, to prevent it from being contaminated by other code. If you're not careful, you run the risk of having the compiler misinterpret your code.

Because of side-effects it is considered a very bad idea to use macro functions as described above.

int x = -10;
int y = ABSOLUTE_VALUE( x++ );

If ABSOLUTE_VALUE() were a real function 'x' would now have the value of '-9', but because it was an argument in a macro it was expanded twice and thus has a value of -8.

Example:

To illustrate the dangers of macros, consider this naive macro

#define MAX(a,b) a>b?a:b

and the code

i = MAX(2,3)+5;
j = MAX(3,2)+5;

Take a look at this and consider what the value after execution might be. The statements are turned into

int i = 2>3?2:3+5;
int j = 3>2?3:2+5;

Thus, after execution i=8 and j=3 instead of the expected result of i=j=8! This is why you were cautioned to use an extra set of parenthesis above, but even with these, the road is fraught with dangers. The alert reader might quickly realize that if a or b contains expressions, the definition must parenthesize every use of a,b in the macro definition, like this:

#define MAX(a,b) ((a)>(b)?(a):(b))

This works, provided a,b have no side effects. Indeed,

i = 2;
j = 3;
k = MAX(i++, j++);

would result in k=4, i=3 and j=5. This would be highly surprising to anyone expecting MAX() to behave like a function.

So what is the correct solution? The solution is not to use macro at all. A global, inline function, like this

inline int max(int a, int b) { 
  return a>b?a:b 
}

has none of the pitfalls above, but will not work with all types.

NOTE: The explicit inline declaration is not really necessary unless the definition is in a header file, since your compiler can inline functions for you (with gcc this can be done with -finline-functions or -O3). The compiler is often better than the programmer at predicting which functions are worth inlining. Also, function calls are not really expensive (they used to be).

The compiler is actually free to ignore the inline keyword. It is only a hint (except that inline is necessary in order to allow a function to be defined in a header file without generating an error message due to the function being defined in more than one translation unit).


(#, ##)

The # and ## operators are used with the #define macro. Using # causes the first argument after the # to be returned as a string in quotes. For example, the command

#define as_string( s ) # s

will make the compiler turn this command

puts( as_string( Hello World! ) ) ;

into

puts( "Hello World!" );

Using ## concatenates what's before the ## with what's after it. For example, the command

#define concatenate( x, y ) x ## y
...
int xy = 10;
...

will make the compiler turn

printf( "%d", concatenate( x, y ));

into

printf( "%d", xy);

which will, of course, display 10 to standard output.

It is possible to concatenate a macro argument with a constant prefix or suffix to obtain a valid identifier as in

#define make_function( name ) int my_ ## name (int foo) {}
make_function( bar )

which will define a function called my_bar(). But it isn't possible to integrate a macro argument into a constant string using the concatenation operator. In order to obtain such an effect, one can use the ANSI C property that two or more consecutive string constants are considered equivalent to a single string constant when encountered. Using this property, one can write

#define eat( what ) puts( "I'm eating " #what " today." )
eat( fruit )

which the macro-processor will turn into

puts( "I'm eating " "fruit" " today." )

which in turn will be interpreted by the C parser as a single string constant.

The following trick can be used to turn a numeric constants into string literals

#define num2str(x) str(x)
#define str(x) #x
#define CONST 23

puts(num2str(CONST));

This is a bit tricky, since it is expanded in 2 steps. First num2str(CONST) is replaced with str(23), which in turn is replaced with "23". This can be useful in the following example:

#ifdef DEBUG
#define debug(msg) fputs(__FILE__ ":" num2str(__LINE__) " - " msg, stderr)
#else
#define debug(msg)
#endif

This will give you a nice debug message including the file and the line where the message was issued. If DEBUG is not defined however the debugging message will completely vanish from your code. Be careful not to use this sort of construct with anything that has side effects, since this can lead to bugs, that appear and disappear depending on the compilation parameters.

macrosEdit

Macros aren't type-checked and so they do not evaluate arguments. Also, they do not obey scope properly, but simply take the string passed to them and replace each occurrence of the macro argument in the text of the macro with the actual string for that parameter (the code is literally copied into the location it was called from).

An example on how to use a macro:

 #include <stdio.h>

 #define SLICES 8
 #define ADD(x) ( (x) / SLICES )

 int main(void) 
 {
   int a = 0, b = 10, c = 6;

   a = ADD(b + c);
   printf("%d\n", a);
   return 0;
 }

-- the result of "a" should be "2" (b + c = 16 -> passed to ADD -> 16 / SLICES -> result is "2")

NOTE:
It is usually bad practice to define macros in headers.

A macro should be defined only when it is not possible to achieve the same result with a function or some other mechanism. Some compilers are able to optimize code to where calls to small functions are replaced with inline code, negating any possible speed advantage. Using typedefs, enums, and inline (in C99) is often a better option.

One of the few situations where inline functions won't work -- so you are pretty much forced to use function-like macros instead -- is to initialize compile time constants (static initialization of structs). This happens when the arguments to the macro are literals that the compiler can optimize to another literal. [2]

#errorEdit

The #error directive halts compilation. When one is encountered the standard specifies that the compiler should emit a diagnostic containing the remaining tokens in the directive. This is mostly used for debugging purposes.

#error message

#warning#warning

Many compilers support a #warning directive. When one is encountered, the compiler emits a diagnostic containing the remaining tokens in the directive.

  1. warning message

#undef#undef

The #undef directive undefines a macro. The identifier need not have been previously defined.

#if,#else,#elif,#endif (conditionals)#if,#else,#elif,#endif (conditionals)

The #if command checks whether a controlling conditional expression evaluates to zero or nonzero, and excludes or includes a block of code respectively. For example:

 #if 1
 /* This block will be included */
 #endif
 #if 0
 /* This block will not be included */
 #endif

The conditional expression could contain any C operator except for the assignment operators, the increment and decrement operators, the address-of operator, and the sizeof operator.

One unique operator used in preprocessing and nowhere else is the defined operator. It returns 1 if the macro name, optionally enclosed in parentheses, is currently defined; 0 if not.

The #endif command ends a block started by #if, #ifdef, or #ifndef.

The #elif command is similar to #if, except that it is used to extract one from a series of blocks of code. E.g.:

 #if /* some expression */
   :
   :
   :
 #elif /* another expression */
   :
 /* imagine many more #elifs here ... */
   :
 #else
 /* The optional #else block is selected if none of the previous #if or
    #elif blocks are selected */
   :
   :
 #endif /* The end of the #if block */

#ifdef,#ifndef#ifdef,#ifndef

The #ifdef command is similar to #if, except that the code block following it is selected if a macro name is defined. In this respect,

#ifdef NAME

is equivalent to

#if defined NAME


The #ifndef command is similar to #ifdef, except that the test is reversed:

#ifndef NAME

is equivalent to

#if !defined NAME

Useful Preprocessor Macros for DebuggingUseful Preprocessor Macros for Debugging

ANSI C defines some useful preprocessor macros and variables,[3][4] also called "magic constants", include:

__FILE__ => The name of the current file, as a string literal
__LINE__ => Current line of the source file, as a numeric literal
__DATE__ => Current system date, as a string
__TIME__ => Current system time, as a string
__TIMESTAMP__ => Date and time (non-standard)
__cplusplus => undefined when your C code is being compiled by a C compiler; 199711L when your C code is being compiled by a C++ compiler compliant with 1998 C++ standard.
__func__ => Current function name of the source file, as a string (part of C99)
__PRETTY_FUNCTION__ => "decorated" Current function name of the source file, as a string (in GCC; non-standard)

Compile-time assertionsCompile-time assertions

Some people[5] define a preprocessor macro to allow compile-time assertions, something like:

#define COMPILE_TIME_ASSERT(pred) switch(0){case 0:case pred:;}
 
COMPILE_TIME_ASSERT( BOOLEAN CONDITION );

The static_assert.hpp Boost library defines a similar macro. Some compilers define a static_assert keyword used in the same way.[6]

Such compile-time assertions can help you debug faster than using only run-time assert() statements, because the compile-time assertions are all tested at compile time, while it is possible that a test run of a program may fail to exercise some run-time assert() statements.

X-MacrosX-Macros

One little-known usage pattern of the C preprocessor is known as "X-Macros".[7][8][9][10] An X-Macro is a header file or macro. Commonly these use the extension ".def" instead of the traditional ".h". This file contains a list of similar macro calls, which can be referred to as "component macros". The include file is then referenced repeatedly in the following pattern. Here, the include file is "xmacro.def" and it contains a list of component macros of the style "foo(x, y, z)".

#define foo(x, y, z) doSomethingWith(x, y, z);
#include "xmacro.def"
#undef foo
 
#define foo(x, y, z) doSomethingElseWith(x, y, z);
#include "xmacro.def"
#undef foo
 
(etc...)

The most common usage of X-Macros is to establish a list of C objects and then automatically generate code for each of them. Some implementations also perform any #undefs they need inside the X-Macro, as opposed to expecting the caller to undefine them.

Common sets of objects are a set of global configuration settings, a set of members of a struct, a list of possible XML tags for converting an XML file to a quickly-traversable tree, or the body of an enum declaration; other lists are possible.

Once the X-Macro has been processed to create the list of objects, the component macros can be redefined to generate, for instance, accessor and/or mutator functions. Structure serializing and deserializing are also commonly done.

Here is an example of an X-Macro that establishes a struct and automatically creates serialize/deserialize functions. For simplicity, this example doesn't account for endianness or buffer overflows.

File star.def:

EXPAND_EXPAND_STAR_MEMBER(x, int)
EXPAND_EXPAND_STAR_MEMBER(y, int)
EXPAND_EXPAND_STAR_MEMBER(z, int)
EXPAND_EXPAND_STAR_MEMBER(radius, double)
#undef EXPAND_EXPAND_STAR_MEMBER

File star_table.c:

typedef struct {
  #define EXPAND_EXPAND_STAR_MEMBER(member, type) type member;
  #include "star.def"
  } starStruct;
 
void serialize_star(const starStruct *const star, unsigned char *buffer) {
  #define EXPAND_EXPAND_STAR_MEMBER(member, type) \
    memcpy(buffer, &(star->member), sizeof(star->member)); \
    buffer += sizeof(star->member);
  #include "star.def"
  }
 
void deserialize_star(starStruct *const star, const unsigned char *buffer) {
  #define EXPAND_EXPAND_STAR_MEMBER(member, type) \
    memcpy(&(star->member), buffer, sizeof(star->member)); \
    buffer += sizeof(star->member);
  #include "star.def"
  }

Handlers for individual data types may be created and accessed using token concatenation ("##") and quoting ("#") operators. For example, the following might be added to the above code:

#define print_int(val)    printf("%d", val)
#define print_double(val) printf("%g", val)
 
void print_star(const starStruct *const star) {
  /* print_##type will be replaced with print_int or print_double */
  #define EXPAND_EXPAND_STAR_MEMBER(member, type) \
    printf("%s: ", #member); \
    print_##type(star->member); \
    printf("\n");
  #include "star.def"
  }

Note that in this example you can also avoid the creation of separate handler functions for each datatype in this example by defining the print format for each supported type, with the additional benefit of reducing the expansion code produced by this header file:

#define FORMAT_(type) FORMAT_##type
#define FORMAT_int    "%d"
#define FORMAT_double "%g"
 
void print_star(const starStruct *const star) {
  /* FORMAT_(type) will be replaced with FORMAT_int or FORMAT_double */
  #define EXPAND_EXPAND_STAR_MEMBER(member, type) \
    printf("%s: " FORMAT_(type) "\n", #member, star->member);
  #include "star.def"
  }

The creation of a separate header file can be avoided by creating a single macro containing what would be the contents of the file. For instance, the above file "star.def" could be replaced with this macro at the beginning of:

File star_table.c:

#define EXPAND_STAR \
  EXPAND_STAR_MEMBER(x, int) \
  EXPAND_STAR_MEMBER(y, int) \
  EXPAND_STAR_MEMBER(z, int) \
  EXPAND_STAR_MEMBER(radius, double)

and then all calls to #include "star.def" could be replaced with a simple EXPAND_STAR statement. The rest of the above file would become:

typedef struct {
  #define EXPAND_STAR_MEMBER(member, type) type member;
  EXPAND_STAR
  #undef  EXPAND_STAR_MEMBER
  } starStruct;
 
void serialize_star(const starStruct *const star, unsigned char *buffer) {
  #define EXPAND_STAR_MEMBER(member, type) \
    memcpy(buffer, &(star->member), sizeof(star->member)); \
    buffer += sizeof(star->member);
  EXPAND_STAR
  #undef  EXPAND_STAR_MEMBER
  }
 
void deserialize_star(starStruct *const star, const unsigned char *buffer) {
  #define EXPAND_STAR_MEMBER(member, type) \
    memcpy(&(star->member), buffer, sizeof(star->member)); \
    buffer += sizeof(star->member);
  EXPAND_STAR
  #undef  EXPAND_STAR_MEMBER
  }

and the print handler could be added as well as:

#define print_int(val)    printf("%d", val)
#define print_double(val) printf("%g", val)
 
void print_star(const starStruct *const star) {
  /* print_##type will be replaced with print_int or print_double */
  #define EXPAND_STAR_MEMBER(member, type) \
    printf("%s: ", #member); \
    print_##type(star->member); \
    printf("\n");
  EXPAND_STAR
  #undef EXPAND_STAR_MEMBER
}

or as:

#define FORMAT_(type) FORMAT_##type
#define FORMAT_int    "%d"
#define FORMAT_double "%g"
 
void print_star(const starStruct *const star) {
  /* FORMAT_(type) will be replaced with FORMAT_int or FORMAT_double */
  #define EXPAND_STAR_MEMBER(member, type) \
    printf("%s: " FORMAT_(type) "\n", #member, star->member);
  EXPAND_STAR
  #undef EXPAND_STAR_MEMBER
  }

A variant which avoids needing to know the members of any expanded sub-macros is to accept the operators as an argument to the list macro:

File star_table.c:

/*
 Generic
 */
#define STRUCT_MEMBER(member, type, dummy) type member;
 
#define SERIALIZE_MEMBER(member, type, obj, buffer) \
  memcpy(buffer, &(obj->member), sizeof(obj->member)); \
  buffer += sizeof(obj->member);
 
#define DESERIALIZE_MEMBER(member, type, obj, buffer) \
  memcpy(&(obj->member), buffer, sizeof(obj->member)); \
  buffer += sizeof(obj->member);
 
#define FORMAT_(type) FORMAT_##type
#define FORMAT_int    "%d"
#define FORMAT_double "%g"
 
/* FORMAT_(type) will be replaced with FORMAT_int or FORMAT_double */
#define PRINT_MEMBER(member, type, obj) \
  printf("%s: " FORMAT_(type) "\n", #member, obj->member);
 
/*
 starStruct
 */
 
#define EXPAND_STAR(_, ...) \
  _(x, int, __VA_ARGS__) \
  _(y, int, __VA_ARGS__) \
  _(z, int, __VA_ARGS__) \
  _(radius, double, __VA_ARGS__)
 
typedef struct {
  EXPAND_STAR(STRUCT_MEMBER, )
  } starStruct;
 
void serialize_star(const starStruct *const star, unsigned char *buffer) {
  EXPAND_STAR(SERIALIZE_MEMBER, star, buffer)
  }
 
void deserialize_star(starStruct *const star, const unsigned char *buffer) {
  EXPAND_STAR(DESERIALIZE_MEMBER, star, buffer)
  }
 
void print_star(const starStruct *const star) {
  EXPAND_STAR(PRINT_MEMBER, star)
  }

This approach can be dangerous in that the entire macro set is always interpreted as if it was on a single source line, which could encounter compiler limits with complex component macros and/or long member lists.

This technique was reported by Lars Wirzenius[11] in a web page dated January 17, 2000, in which he gives credit to Kenneth Oksanen for "refining and developing" the technique prior to 1997. The other references describe it as a method from at least a decade before the turn of the century.

  1. Understanding C++/C Preprocessor
  2. David Hart, Jon Reid. "9 Code Smells of Preprocessor Use". 2012.
  3. HP C Compiler Reference Manual
  4. C++ reference: Predefined preprocessor variables
  5. "Compile Time Assertions in C" by Jon Jagger 1999
  6. Wikipedia: C++0x#Static assertions
  7. Wirzenius, Lars. C Preprocessor Trick For Implementing Similar Data Types Retrieved January 9, 2011.
  8. Meyers, Randy (May 2001). "The New C: X Macros". Dr. Dobb's Journal. http://www.ddj.com/cpp/184401387. Retrieved 1 May 2008. 
  9. Beal, Stephan (August 2004). Supermacros. http://wanderinghorse.net/computing/papers/#supermacros. Retrieved 27 October 2008. 
  10. Keith Schwarz. "Advanced Preprocessor Techniques". 2009. Includes "Practical Applications of the Preprocessor II: The X Macro Trick".
  11. Wirzenius, Lars. C Preprocessor Trick For Implementing Similar Data Types Retrieved January 9, 2011.
Previous: Error handling Index Next: Libraries