Programming Language Concepts Using C and C++/Introduction to Programming in C

What follows is an assortment of simple C programs presented to provide an exposure to the basics of the language. In order to better achieve this purpose, examples are kept simple and short. Unfortunately, for programs of such scale certain criteria become less important than they normally should be. One such criterion is source code portability, which can be ensured by checking programs against standards compliance. This criterion, if not pursued diligently from the start, is very difficult to fulfill.

For easier tackling of the problem, many compilers offer some assistance. Although the degree of support and its form may vary, it’s worth taking a look at the compiler switches. For the compilers we will be using in the context of this class an incomplete list of applicable options are given below.

Option* Meaning Valid in
-Wall Provide warnings for questionable programming practices, such as missing function return types, unused identifiers, switch statements without a default, and so on. Both
-std Check for compliance witha particular standard. gcc
-pedantic Check programs for strict standards compliance and reject all programs using non-standard extensions. gcc
-Tc Similar to -pedantic MS Visual C/C++
*: Options supported by MS Visual C/C++ can also be prefixed with '/'.
Example: Compiler option usage.

Using gcc, provide the required compilation command needed to ensure any sloppy programming does not go unnoticed. Also make sure your source code can be easily ported to other development environments.

First requirement is met by passing -Wall, while the second one is fulfilled by using the -pedantic option. So, the required compilation command looks like below.

gcc -Wall -pedantic [other options] filename

Same purpose can be achieved in MS Visual C/ C++ by the following command:

cl [other options] /Wall /Tc filename

C Preprocessor edit

The C preprocessor is a simple macro processor—a C-to-C translator—that conceptually processes the source text of a C program before the compiler proper reads the source program. Generally, the preprocessor is actually a separate program that reads the original source file and writes out a new "preprocessed" source file that can then be used as input to the C compiler. In other implementations, a single program performs the preprocessing and compilation in a single pass over the source file. The advantage of the former scheme is, apart from its more modular structure, the possibility of translators for other programming languages using the preprocessor.

The preprocessor is controlled by special preprocessor command lines, which are lines of the source file beginning with the character #.

The preprocessor removes all preprocessor command lines from the source file and makes additional transformations on the source file as directed by the commands.

The name of the command must follow the # character.[1] A line whose only non-whitespace character is a # is termed null directive in ISO C and is treated the same as blank line.

Defining a Macro edit

#define preprocessor command causes a name to become defined as a macro to the preprocessor. A sequence of tokens, called the body of the macro, is associated with the name. When the name of the macro is recognized in the program source text or in the arguments of certain other preprocessor commands, it is treated as a call to that macro; the name is effectively replaced by a copy of the body. If the macro is defined to accept arguments, then the actual arguments following the macro name are substituted for formal parameters in the macro body. Argument passing mechanism used is akin to call-by-name. However, one should not forget the fact that text replacement is carried out by the preprocessor, not the compiler.

This macro directive can be used in two different ways.

  • Object-like macro definitions: #define name sequence-of-tokens?, where ? stands for zero or one occurrence of the preceding entity. Put another way, the body of the macro may be empty.
Example: Object-like macro definition

#define BOOL char /* Not #define BOOL=char */ #define FALSE 0 #define TRUE 1 #define CRAY

Note that there is no equal sign. Neither is a semicolon required to terminate the lines. Leading and trailing white space around the token sequence is discarded.

Redefinition of a macro is allowed in ISO C provided that the new definition is the same, token for token, as the existing definition. Redefining a macro with a different definition is possible only if an undef directive is issued before the second definition.

  • Defining Macros with Parameters: #define name(name1, name2, ..., namen) sequence-of-tokens?. The left parenthesis must immediately follow the name of the macro with no intervening whitespace. Such a macro definition would be interpreted as an object-like macro that starts with a left parenthesis. The names of the formal parameters must be identifiers, no two the same. A function-like macro can have an empty formal parameter list. This is used to simulate a function with no arguments.
Example: Macro with a parameter
#define increment(n) (n + 1)

When a function-like macro call is encountered, the entire macro call is replaced, after parameter processing, by a processed copy of the body. The entire process of replacing a macro call with the processed copy of its body is called macro expansion; the processed copy of the body is called the expansion of the macro call. Therefore, with the above definition, a = 3 * increment(a); would be expanded to a = 3 * (a + 1);. Now that preprocessing takes place prior to compilation, the compiler proper does not even see the identifier increment in the source code. All it sees is the expanded form of the macro.

#define product(x, y) (x * y)
The above innocent looking code is unfortunately wrong. It won’t produce the correct result for result = product(a + 3, b);. This line, after preprocessing, will be transformed into result = (a + 3 * b);. Not exactly what we wanted!
Instead, we must write
#define product(x, y) ((x) * (y))
With this definition in place, the previous example will be expanded as result = ((a + 3) * (b));. Note that the programmer does not see the superfluous parentheses. She sees the un-preprocessed form of the program text.
Example: An incorrect macro definition
#define SQUARE(x) x * x
This definition would fail to produce the correct result for a + 3. It would yield a + 3 * a + 3 instead of (a + 3) * (a + 3). This can be taken care of by putting parentheses around x’s.
#define SQUARE(x) (x) * (x)
However, this second version fails to produce the correct result for result = (long) SQUARE(a + 3);, which is expanded to result = (long) (a + 3) * (a + 3);
In the example above, only the first subexpression would be cast into a long. In order to apply the cast operator to the entire expression, we should further put parentheses around the entire body of the macro. So, the correct version of the macro is:
#define SQUARE(x) ((x) * (x))
Is this an end to the troubles? Unfortunately, no! What if we use the macro in the following way? This would be expanded to result = ((x++) * (x++));
result = SQUARE(x++);
The problem with this expression is that the side effect due to the increment operator takes place twice and x is multiplied by x + 1, not itself. This [once again] goes to show that one has to avoid writing side effect producing expressions.

The problems we faced in the above example may be attributed to the textual nature of the parameter passing mechanism used: call-by-name: Each time the parameter name appears in the text, it is replaced by the exact text of the argument; each time it is replaced, it is evaluated once again.


#define swap(x, y) \ { unsigned long temp = x; x = y; y = temp; }

\ at the end of the line is used for line continuation. For

if (x < y) swap(x, y); else x = y;

we get

if (x < y) { unsigned long temp = x; x = y; y = temp; }; else x = y;

The semicolon after the closing brace is extraneous and causes a compile-time error. A way out of this is given below.

#define FALSE 0 #define swap(x, y) \ do { unsigned long temp = x; x = y; y = temp; } while(FALSE)

After expansion we will have

if (x < y) do { unsigned long temp = x; x = y; y = temp; } while(FALSE); else x = y;

Predefined Macros edit

Macro Value
__LINE__ The line number of the current source program line, expressed as a decimal integer constant
__FILE__ The name of the current source file, expressed as a string constant
__DATE__ The calendar date of the translation, expressed as a string constant of the form "Mmm dd yyyy"
__TIME__ The time of the translation, expressed as a string constant of the form "hh:mm:ss"
__STDC__ The decimal constant, if and only if the compiler is an ISO conforming implementation
__STDC_VERSION__ If the implementation conforms to Amendment 1 of ISO C, then this macro is defined and has the value 199409L

Undefining a Macro edit

#undef name
#undef causes the preprocessor to forget any macro definition of name. It is not an error to undefine a name that is currently not defined. Once a name has been undefined, it may then be given a completely new definition without error.
Example: Undefining a macro.

#define square(x) ((x) * (x)) ... a = square(b); ... ... #undef square ... ... c = square(d++); ...

The first application of square will use the macro definition, while the second one will call the function.

Macros vs. Functions edit

Choosing macro definition over function makes sense when

  • efficiency is a concern,
  • the macro definition is simple and short, or used infrequently in the source, and
  • the parameters are evaluated only once.

Macro definition is efficient because preprocessing takes place before runtime. It actually takes place even before the compiler proper starts. On the other hand, function call is a rather expensive instruction and it is executed in the runtime. In this sense, a macro can be used to simulate inlining of a function.

Macro definitions are expected to be simple and short because replacing large bodies of text in the source will give rise to code-bloat, especially when it is used frequently.

Justification of the third requirement was given previously.

On the down side, it should be kept in mind that the preprocessor does not do any type-checking on macro parameters.

Code Inclusion edit

#include <filename>
#include "filename"
#include preprocessor-tokens
The #include preprocessor command causes the entire contents of a specified source file to be processed as if those contents had appeared in place of the #include command.

The general intent is that the "filename" form is used to refer to the header files written by the programmer, whereas the <filename> form is used to refer to standard implementation files. The third form undergoes normal macro expansion and the result must match one of the first two forms.

Conditional Preprocessing edit

#if constant-expression
These commands are used together to allow lines of source text to be conditionally included or excluded from the compilation. It comes handy when we need to produce code for different architectures (e.g., Vax and Intel) or in different modes (e.g., debugging mode and production mode).
#ifdef name
#ifndef name
These two commands are used to test whether a name is defined as a preprocessor macro. They are equivalent to #if defined(name) and #if !defined(name), respectively.

Notice that #if name and #ifdef name are not equivalent. Although they work exactly the same way when name is undefined, the parity breaks when name is defined to be 0. In that case, the former will be false while the latter will be true.

defined name
The defined operator can be used only in #if and #elif expressions; it is not a preprocessor command but a preprocessor operator.

Emitting Error Messages edit

#error preprocessor-tokens
The #error directive produces a compile-time error message that will include the argument tokens, which are subject to macro expansion.
Example: #error macro

#if (TABLESIZE > 1000) #error "Table size cannot exceed 1000" #endif

Compile-Time Switches edit

In addition to the use of the #define command; we can use a compile-time switch to define a macro.

Example: Defining a macro on the command line.
gcc –DTABLESIZE=500 prog_name↵
will treat the program as if #define TABLESIZE 500 were the first line of the source.
Example: Conditional preprocessing via a compiler switch.
gcc –DVAX prog_name↵
The above command will define a macro named VAX and (preprocess and) compile a program named prog_name.

#ifdef VAX
VAX-dependent code
#endif #ifdef PDP11
PDP11-dependent code
#endif #ifdef CRAY2
CRAY2-dependent code

With VAX defined by a compile-time switch and these definitions in the source, only the VAX-dependent code will be processed. Code related to other architectures will be ignored. When we want to compile our program for a different architecture, all we have to do is to change the compile-time switch to the relevant architecture.
A novice (or malicious) programmer might define more than one architecture macro at command line.
gcc –DVAX –DCRAY2 prog_name↵
In this case, the compiled program will contain code meant for two different architectures. This can be avoided by the following preprocessor commands.

#if defined(VAX)
VAX-dependent code
#elif defined(PDP-11)
PDP11-dependent code
#elif defined(CRAY2)
CRAY2-dependent code
#else #error "Target architecture not specified/known! Aborting compilation..." #endif

Example: Debugging mode compilation
In prog_name,

... #if defined(DEBUG)
Debugging mode code
#endif ...

The command gcc –DDEBUG prog_name↵ will produce (object) code that contains some extra statements for debugging. Once debugging phase is over, you can recompile the program with gcc prog_name↵ and debugging mode code will not be included in the object code. Nice thing about this is that you do not need to remove the debugging mode code from the source!

char Data Type edit

Problem: Write a program that prints out the characters 'a'..'z', 'A'..'Z', and '0'..'9' along with their encoding.


The following preprocessor directive pulls the contents of the file named stdio.h into the current file. Once this file is included, its contents are interpreted by the preprocessor.

For each function we use we need to bring in its prototype, whose location can be found out by means of the man command found in UNIX-based environments: man function_name will provide you with information about funtion_name, including the header file it's declared in.

Included files generally contain declarations shared among different applications. Constant declarations and function prototypes are examples to these. In this case, we must include stdio.h in order to pull in the prototype of the printf function.

A function prototype consists of the function return type, the name of the function, and the parameter list. It describes the interface of the function: it details the number, order, and types of parameters that must be provided when the function is called and the type of value that the function returns.

A function prototype helps the compiler ensure correct usage of a particular function. For instance, given the declaration in stdio.h, one cannot pass an int as the first argument to the printf function.

Time and again you will see the words prototype and signature used interchangeably. This however is not correct. Signature of a function is its list of argument types; it does not include the function return type.

#include <stdio.h>

All runnable C programs must contain a function named main. This function serves as the entry point to the program and is called from within the C runtime, which is executed after the executable is loaded, as part of the initialization code.

The system software needed to copy a program from secondary storage into the main memory so that it’s ready to run is called a loader. In addition to bringing the program into the memory, it can also set protection bits, arrange for virtual memory to map virtual addresses to disk pages, and so on.

The formal parameter list of the following main function consists of the single keyword void, which means that it does not take any parameters at all. You will at times see C codes with int main() or main() written instead of :int main(void). In this context all three serve the same purpose although the first two should be avoided. A function with no return type in its declaration/ definition is assumed to return a value of type int. A function prototype with an empty formal parameter list is an old style declaration that tells the compiler about the presence of a function, which can take any number of arguments of any type. It also has the unasked for side effect of turning off prototype checking from that point on. So, one should avoid such usages.

int main(void) {

Although we will manipulate characters, the variable used to hold the characters, ch, is declared as an int. The reasoning for this is as follows: Absence of the sign qualifier (signed or unsigned) in the original language design gave compiler implementers the freedom to interpret char as a type comprising values in [0..255]—because characters can be indexed by nonnegative values only—or as a type comprising values in [–128..127]—because it is an integer type represented in a single byte. These two views basically limit the range of char to [0..127].

Some information on ASCII

Due to the lack of exceptions in C, most functions dealing with characters and character strings need to represent exceptional situations, such as end-of file, as unlikely return values. This means, in addition to the legitimate characters, we should be able to encode the exceptional situations as different values. ASCII, this implies a representation that can hold 128 + n signed values, where n is the number of exceptional conditions to be dealt with. When taken together with the conclusion drawn in the previous paragraph, it can be seen that we need a representation that is larger than type char. Therefore, you'll often see an int variable being used to hold values of type char.

Encountering a character constant the C compiler replaces it with an integer constant that corresponds to the order of the character in the encoding table. For instance, 'a' in ASCII is replaced with 97, which is an integer constant of type int.

Had we used char instead of int our program would still be working correctly. This is because we did not have to deal with any exceptional conditions in the program and all int constants used are within the limits of char representation. That is, all narrowing implicit conversions, as that would take place in the initialization statement of the for loop, are value preserving.

  int ch;
  for(ch = 'a'; ch <= 'z'; ch++)

printf function is used to format and send the formatted output on the standard output file stdout, which by default is the screen. It is a variadic function: that is, it takes a variable length argument list. The first argument is taken to be a format control string, which is used to figure out the type and number of arguments. This is accomplished through the use of special character sequences starting with %. When encountered, such a sequence is replaced with the corresponding actual parameter, if the actual parameter can be used in the context implied by the character sequence. For instance, '%c' in the following line means that the corresponding argument must be interpretable as a character. Likewise, '%d' is a placeholder for a decimal number.

Note that printf actually returns an int. Not assigning this return value to some variable means that it is ignored.

    printf("Encoding of '%c' is %d\n", ch, ch);
  printf("Press any key to continue\n"); getchar();

  for(ch = 'A'; ch <= 'Z'; ch++)
    printf("Encoding of '%c' is %d\n", ch, ch);
  printf("Press any key to continue\n"); getchar();
  for(ch = '0'; ch <= '9'; ch++)
    printf("Encoding of '%c' is %d\n", ch, ch);

} /* end of int main(void) */

Compiling and Running a Program in Linux Command Line edit

Given that this program is saved under the name Encoding.c, it can be compiled (and linked) using

gcc –o AlphaNumerics Encoding.c↵
GCC (GNU Compiler Collection) is, as its name implies, a collection of compilers for various programming languages. gcc, the C/C++ frontend to GCC, performs the preprocessing, compilation, assembly, and link phases required for creating an execuable from C source files.

gcc invokes the GNU C/C++ compiler driver, which first gets the C-preprocessor process Encoding.c and passes the transformed source file to the compiler proper. The output of the compiler proper, an assembly program, is later passed to the assembler. The object code file assembled by the assembler is finally handed to the linker, which links it with the standard C library and stores the executable in a disk file whose name is provided with -o option. Note that you do not have to tell the driver to link with the standard C library. This is a special case, though. With other libraries you must tell the compiler driver the libraries and object files to be linked with. This whole scheme basically creates a new external command named AlphaNumerics. Issuing the command


at the command line will eventually cause the loader to load the program from the secondary storage to memory and run it.

Compiling and Running a Program Using Emacs edit

emacs &

This command will put you in the Emacs development environment. Click on File→Open... and pick Encoding.c from the file list. This will open up a new C-mode buffer within the current window. Note that the second line from the bottom shows (C Abbrev), which means Emacs has identified your source code as a C program. Next, click on Tools→Compile►→Compile.... This will prompt you to enter the command needed to compile (and link) the program. This prompt will be printed in an area called the minibuffer, which is normally the very last line of the frame. Erase the default selection and write

gcc –o AlphaNumerics.exe Encoding.c↵

As you hit the enter key, you will see a *compilation* buffer pop up that lets you know about how the compilation process is proceeding. Hoping you don’t make any typos and everything goes smoothly, next thing we will do is to run the executable. In order to do this, click on Tools→Shell►→Shell. This will open a restricted shell inside a Shell mode buffer, from where you can run your executables. Type in


within this buffer and you will see the same output as you saw in the previous section.

In case you end up with compilation errors, clicking on an error line in the *compilation* buffer takes you to the relevant source line.

If you want to go back to the source code and make some changes, click on Buffers→Encoding.c. When you are through with the changes, you can compile the source code once again by clicking on Tools→Compile►→Repeat Compilation. This will recompile Encoding.c using the command entered above. However, in case you may want to modify the command, click on Tools→Compile►→Compile... and proceed as you did before.

Non-portable Version edit

Assuming ASCII, you may be tempted to replace all character constants in the program with the corresponding integer values. This is strongly discouraged since it will make the program non-portable.

#include <stdio.h>
#include <stdlib.h>

int main(void) {
  int ch;

The following line contains embedded constants, which cause the code to be non-portable. What if we wanted to move our code to some environment where EBCDIC is used to encode characters? So, one should avoid embedding such implementation dependent features into the programs and let the compiler do the dirty work.

Note also, since the same action is taken by the compiler, probable motivation—speeding up the program—for replacing character literals with integer constants is not well-founded, either.

  for(ch = 97; ch <= 122; ch++)
    printf(Encoding of %c is %d\n, ch, ch);
  printf(Press any key to continue\n);

  for(ch = 65; ch <= 90; ch++)
    printf(Encoding of %c is %d\n, ch, ch);
  printf(Press any key to continue\n);

  for(ch = 48; ch <= 57; ch++)
    printf(Encoding of %c is %d\n, ch, ch);

The exit function causes the program to terminate, returning the value passed to it as the result of executing the program. Same effect can be achieved by returning an integer value from the main function. By convention, a value of 0 signifies successful termination while a nonzero value is used to signify abnormal termination.

} /* end of int main(void) */

Compiling and Running a Program Using MS Visual C/C++ edit

First of all, make sure you execute vcvars32.bat, which you can find in the bin subdirectory of the MS Visual C/C++ directory. This will set some environment variables you need for correct operation of the command line tools.

cl /FeAscii_Enc.exe Ascii_Encoding.c↵

Similar to gcc, this command will go through preprocess, compile, assemble, and link phases. Upon successful completion, we can run our program simply by issuing the name of the executable filename at the command line. The operating system shell will recognize the resultant executable as an external command and get the loader to load the Ascii_Enc.exe into memory and eventually run it.


Compiling and Running a Program Using DGJPP-rhide edit

Start a new DOS box and enter the following command.

rhide Ascii_Encoding.c↵

This will start a DOS-based IDE that you can use to develop projects in different programming languages. Choose Compile→Make or Compile→Build All or Compile→Compile followed by Compile→Link. This will compile the source code and link the resulting object module with the C runtime. You can now run the executable by clicking on Run→Run or by exiting to DOS by choosing File→DOS Shell and entering the filename at the prompt. (In case you may see unexpected behavior from rhide, make sure the file is not too deep inside the directory hierarchy and the names do not contain special characters such as space.) If you choose the second option you can return to rhide by typing exit at the command prompt.

Macros edit

Problem: Write a program that prints a greeting in English or Turkish. The language should be chosen by a compile-time switch. The name(s) of the person(s) to be greeted is passed to the program through a command-line argument.

#include <stdio.h>

The following line checks to see whether a macro named TURKISH has already been defined or not. Definition of this macro, or any macro for that matter, can be made within a file or at the command line prompt as a compiler switch. In this example, there is no such definition in the file or any included file. So, absence of such a macro as a compiler switch will cause the control jump to the #else part and the statements between #else and #endif are included in the source file. Given the definition is made at the command line prompt, the statements between #if and #else are included in the source file. Whichever section of code is included, one thing is certain: either the part between #if and #else or the part between #else and #endif is included, not both; there is no danger of duplicate definition.

Notice the peculiar way of naming the variables. This is the so-called Hungarian notation. By prefixing specially interpreted characters to the identifier name, this method aims at providing the fellow programmers/maintainers as much context information as possible. Without any reference to the definition of an identifier, which can be pages apart or even in a different source file that is inaccessible to us, we can now garner the required information simply by interpreting the prefix. For instance, szGreeting is meant to be a string ending with zero (that is, a C-style string).

#if defined(TURKISH)
  char szGreeting[] = "Gunaydin,";
  char szGreetStranger[] = "Her kimsen gunaydin!";
  char szGreetAll[] = "Herkese gunaydin!";
  char szGreeting[] = "Good morning,";
  char szGreetStranger[] = "Good morning, whoever you might be!";
  char szGreetAll[] = "Good morning to you all!";

C does not allow the programmer to overload function names. main function is an exception to this: it comes in two flavors. The first one, which we have already seen, does not take any arguments. The second one permits the user to pass command line arguments to the application. These arguments are passed in a vector of pointers to character. If the user wants the arguments to be interpreted as belonging to some other data type, the application code has to do some extra processing.

In C, there is no standard way of telling the size of an array. One should either use a convention or hold the size in another variable. Character strings in C, which may be considered as an array of characters, are an example to the former. Here, a sentinel value (NULL character) is used to signify the end of the string. For most instances, such a scheme turns out to be impossible or infeasible. In that case we need to hold the size (or length) information in a separate variable. Hence is the need for a second variable.

The program name is the first component of the argument vector. So, if the argument count is one it means the user has not passed any arguments at all.

int main(int argc, char *argv[]) {
  switch (argc) {

Execution of a break statement will terminate the innermost enclosing loop (while, for, and do while) or switch statement. That is, it will basically jump to the next line following the loop or switch statement. In our case, control will be transferred to the return statement.

Newcomers to C from a Pascal-based background must be careful about the nature of the switch statement: Unlike the case statement, where each branch is executed mutually, switch lets more than one branch be executed. If that’s not what you want you must delimit the branches with break statements, as was done below. Without the break statements in place an argument count of one would cause all three messages to be printed. Similarly, an argument count of two would print anonymous message together with the appropriate one.

    case 1: printf("%s\n", szGreetStranger); break;
    case 2: printf("%s %s\n", szGreeting, argv[1]); break;
    default: printf("%s\n", szGreetAll);
  } /* end of switch (argc) */

} /* end of int main(int, char**) */

Using Compiler Switches edit

Saving this C program as Greeting.c and issuing

gcc Greeting.c –DTURKISH –o Gunaydin↵ # In Linux

at the command line will produce an executable named Gunaydin. This executable will not include object code for any of the statements between #else and #endif. Similarly, if we compile the program using

gcc Greeting.c –DENGLISH –o GoodMorning↵

we will get an executable named GoodMorning with the statements between #if defined and #else excluded. Note that with no macros defined at the command line, the English version of the greetings will be included.

One should not mistake compile-time switches for command-line arguments. The former is passed to the preprocessor and used to alter the code to be compiled, while the latter is passed to the running program to alter its behavior. Provided that we build our executables as shown above

./Gunaydin Tevfik↵

will produce as output

Gunaydin, Tevfik


./GoodMorning Tevfik Ugur↵

will produce

Good morning to you all!

Pointer Arithmetic edit

Problem: Write a program to demonstrate the relationship between pointers and addresses.

#include <stdio.h>

int string_length(char *str) {
  int len = 0;

The third part in the following for loop increments the variable str, which is a pointer to char. If we fail to realize that address and pointer are two different things, we might be tempted to think that what we do is simply incrementing the address value. But this would be utterly wrong. Although, for pedagogical purposes, we may assume a pointer to be an address, they are not one and the same thing. When we increment a pointer, the address value held in it is incremented by as much as the size of the type of value pointed to by the pointer. However, as far as pointer to char is concerned, incrementing a pointer and address has the same effect. This is due to the fact that a char value is stored in a single byte.

Definition: A pointer is a variable that contains the address of a variable, content of which is interpreted to belong to a certain type.[2]


Note that although a handle on an object may be regarded as a "kind of" pointer, they are two different concepts. Along with other differences such as support for inheritance, unlike pointers and addresses, handles do not take part in arithmetic operations.

  for(; *str != '\0'; str++) len++;

  return len;
} /* end of int string_length(char *) */

long sum_ints(int *seq) {
  long sum = 0;

Next line is an example showing the difference between a pointer and an address. Here, incrementing seq will increase the value of the address held in it by the size of an int.

  for (; *seq > 0; seq++) sum += *seq;

  return sum;
} /* end of long sum_ints(int *) */

int main(void) {

Next line creates and initializes an array of chars. The compiler computes the size of this array. What the compiler does is basically count the number of characters in between double quotes and reserve memory accordingly. Note that the compiler automatically appends a NULL character too. However, if we choose to use aggregate initialization, we have to be a bit more cautious.

char string[] = {G, o, k, c , e};

will create an array of 5 characters, not 6. There won’t be any NULL character at the end of the character array. If that’s not what we really want we should either add the NULL character in initialization as in

char string[] = {G, o, k, c, e, ‘\0};

or revert back to the former style. In both cases, the array is allocated in the runtime stack.

Note also the use of escape sequences for embedding double quotes in the character string. Since it is used to flag the end it's not possible to insert '”' in a string literal. The way out of this is by means of escaping '”', which tells the compiler that the following '”' does not end the character string, it should be literally embedded in the string.

  char string[] = "Kind of \"long\" string";
  int i;
  int sequence[] = {9, 1, 3, 102, 7, 5, 0};

Third argument of the next printf statement is a call to a function that returns an int. What’s more interesting with this call is the way its argument is passed. Although we seem to be passing an array, what is really passed behind the scenes is the address of the first component of the array; that is, &string[0]. This is done regardless of the array size. Advantages of such a scheme are:

  1. All we need to pass is a single pointer, not an entire array. As the array size grows, we save more on memory.
  2. Now that we pass a single pointer we avoid copying the entire array. This means saving both on time and memory.
  3. The changes we make in the array are permanent; the caller sees the changes made by the callee. This is due to the fact that it is the pointer that is passed by value, not the array. So, although we cannot change the address of the first component of the array, we can modify the contents of the array.

The down side of it is that you need to make a local copy of the array if you want the array to remain the same between calls.

  printf("Length of \"%s\": %d\n", string, string_length(string));
  printf("Sum of ints ( ");
  for (i = 0; sequence[i] != 0; i++)
    printf("%d ", sequence[i]);
  printf("): %ld\n", sum_ints(sequence));

} /* end of int main(void) */

Bit Manipulation edit

Originally a systems-programming language, C offers assistance for manipulation of data on a bit-by-bit basis. This comprises bitwise operations and a facility to define structure with bit fields.

Not surprisingly, this aspect of the language is utilized in machine-dependent applications such as real-time systems, system programs [device drivers, for instance] where running speed is a primary concern. Given the sophisticated optimizations done by today's compilers and the non-portable nature of bit manipulation, in the presence of an alternative their use in general-purpose programming should be avoided.

Bitwise Operations edit

Problem: Write functions to extract exponent and mantissa of a float argument.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>


Specification of C does not standardize (with the exception of type char) the size of integer types. The only guarantee given by the language specification is the following relation:

sizeof(short)sizeof(int)sizeof(long)sizeof(long long)

On most machines an int takes up four bytes, whereas on some it takes up two bytes. A menifestation of C's system programming orientation, this variance is due to the fact that size of int was meant to be equal to the word size of the underlying architecture. As a consequence of this, if we take it for granted that four bytes are used to represent an int value, we may occasionally see our programs producing insensible output. This will be due to the fact that an int value representable in four bytes will probably give rise to an overflow when represented in two bytes.

The following #if-#else directive is used to circumvent this problem. UINT_MAX, defined in limits.h, holds the largest possible unsigned int value. On machines where an int value is stored in two bytes, this will be equal to 65535. Otherwise, it will be something else. So, if UINT_MAX happens to be 65535 we can say an int is represented in two bytes. If not, it is represented in four bytes.

#if UINT_MAX==65535
typedef unsigned long BIG_INT;
typedef unsigned int BIG_INT;

char *usage = USAGE: Extract_Fields float_number;

Next function pulls the exponent part of a particular float variable. This is done by isolating the exponent part and shifting the resulting value to right. We use bitwise-and operator (binary &) to isolate the number and shift-right (>>) this isolated exponent (in a way right-adjust the number).

Note that result of right-shifting a negative integer—that is, a number with a bit pattern having 1 in the most significant bit—is undefined. In other words, the behavior is implementation dependent. While in some implementations the sign bit is preserved and therefore right-shifting is effectively sign-extending the number, in others this bit is replaced with a 0.[3]

BIG_INT exponent(float num) {
  float *fnump = &num;
  BIG_INT *lnump = (BIG_INT *) fnump;

Partial image of memory at this point is given in Figure 3. If, say, num has -1.5 as its value, it will be interpreted as -1.5 when seen through fnump. That is, *fnump will be -1.5. However, if seen through lnump, it will be interpreted to contain 3,217,031,168! The difference is due to different ways of looking at the same thing: While *fnump sees num as a float value compliant with the IEEE 754/854, *lnump sees the same four bytes of memory as an integer value (of type unsigned long or unsigned int, depending on the machine the program is running on) encoded using 2’s complement notation.

(insert figure here)

What would be the value of *fnump after we increment *lnump by 1?

The following line first uses a bit mask to isolate the exponent part and then shifts it to right so that the exponent bits occupy the least significant numbers.

(insert figure here)

  return((*lnump & 0x7F800000) >> 23);
} /* end of BIG_INT exponent(float) */

Next function extracts the mantissa part of the number. It does this simply by masking out the sign and exponent parts of the number. This is accomplished by the bitwise-and operator in the return expression. Note that we do not need to shift the number since its mantissa is made up of the lowest 23 digits of the representation.

BIG_INT mantissa(float num) {
  float *fnump = &num;
  BIG_INT *lnump = (BIG_INT*) fnump;

  return(*lnump & 0x007FFFFF);
} /* end of BIG_INT long mantissa(float) */

The following function tries to make sense of the command line arguments. Number of such arguments being one means the user has not passed any numbers. We display an appropriate message and exit the program. If the argument count is two the second one is taken to be the number whose components we will extract. Otherwise, the user has passed more arguments than we could deal with; we simply let her know about it and exit the program.

In case of failure it’s a recommended practice to exit with a nonzero value. This may turn out to be of great help when programs are run in coordination by means of a shell script. If a program depends on the successful completion of another, the script needs to have a reliable way of checking the result of the previous programs. A program exiting with non-descriptive values is of no help in such situations. We have to make sure that when we return with zero, it really means successful execution. Otherwise there is something wrong, which may further be elucidated by different exit codes.

strtod function is used to convert a numeric string, which may also contain -/+ and e/E, to a floating-point number of type double. While it’s easy to figure out the function of the first argument, which is a pointer to the string to be converted, one cannot say the same thing about the second argument. Upon return from strtod the second argument will hold a pointer to the character following the converted part of the input string. Thanks to this, we can process the rest of the string using strtod and its friends.

Friends of strtod
Declaration Description In Case of Error
long strtol(const char* str, char **ptr, int base): Converts a character string to a long int. Returns 0L.
unsigned long strtoul(const char* str, char **ptr, int base): Converts a string to an unsigned long int. Returns 0L.
double atof(const char* str): Converts str to a floating point number and returns the result of this conversion as a double. Returns 0.0.
int atoi(const char* str): Converts str to an integer and returns the result of this conversion as an int. Returns 0.
unsigned long atol(const char* str): Converts str to an integer and returns the result of this conversion as an unsigned long. Returns 0L.
char* strtok(char* str, const char* set): Tokenizes str using characters in set as separators. Passing NULL for the first argument tells this function to pick up from where it left off the last time.

#include <stddef.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char *argv[]) { char *name, *midterm, *final, *assignment; // Assume argv[1] is “Selin Yardimci: 80, 90, 100”. name = strtok(argv[1], ":"); // // printf("Name: %s\n", name); midterm = strtok(NULL, ","); // // printf("Midterm: %lu\t", strtoul(midterm, NULL, 10)); final = strtok(NULL, ","); // // printf("Final: %lu\t", strtoul(final, NULL, 10)); assignment = strtok(NULL, "\0"); // // printf("Assignment: %lu\n", strtoul(assignment, NULL, 10)); return(0); } /* end of int main(int, char**) */

gcc -o Parse.exe Parse.c↵
# Double-quotes are needed to treat the string as a single argument. It is not a part of the argument string and will be stripped before passing to main!
Parse "Selin Yardimci: 80, 90, 100"↵
Name: Selin Yardimci
Midterm: 80 Final: 90 Assignment: 100
float number(int argc, char *argv[]) {
  switch (argc) {
    case 1: 
      printf(Too few arguments. %s\n, usage);
    case 2:
      return((float) strtod(argv[1], NULL));
      printf(Too many arguments. %s\n, usage);
  } /* end of switch (argc) */
} /* end of float number(int, char **) */

Observe the simplicity of the main function. We just state what the program does; we do not delve in the details of how it does that. Reading the main function, the maintainer of this code can easily figure out what it claims to do. In case she may need more detail, she has to check the function bodies. Depending on the complexity of the program, these functions will provide full details of the implementation or defer this provision to another function. In a complex program this deferment can be extended to multiple levels.

However simple or complex the problem at hand might be and whichever paradigm we use, we start by answering the question "What" and then move (probably in degrees) on to answering the question "How". In other words, we first analyze the problem and come up with a design for the solution and then provide an implementation.[4] Our code should reflect this: it should first expose the answer to "what" (interface) and then (to the interested parties) the answer to "how" (implementation).

int main(int argc, char *argv[]) {
  float num;

  printf(Exponent of the number is: %x\n, 
            exponent(num = number(argc, argv)));
  printf(Mantissa of the number is: %x\n, mantissa(num));

} /* end of int main(int, char**) */

Bit Fields edit

Problem: Using bit fields, implement the previous problem.

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>


Bit fields are defined similar to ordinary record fields. The only difference between the two is the width specification following the bit field. According to the definition below fraction, exponent, and sign occupy twenty-three, eight, and one bit, respectively. However, rest is pretty much up to the compiler implementation.

To start with, as is manifested by the preprocessor directive, memory layout of a variable of type struct SINGLE_FP depends on the processor endianness. Manner of bit packing is also not guaranteed to be the same among different hardware. These two factors effectively limit the use of bit fields to machine-dependent programs.

struct SINGLE_FP {
#ifdef BIG_ENDIAN /* e.g. Motorola */
  unsigned int sign : 1;
  unsigned int exponent : 8;
  unsigned int fraction : 23;
#else /* if LITTLE_ENDIAN, e.g. Intel */
  unsigned int fraction : 23;
  unsigned int exponent : 8;
  unsigned int sign : 1;

BIG_INT exponent(float num) {
  float *fnump = &num;
  struct SINGLE_FP *lnump = (struct SINGLE_FP *) fnump;

There are two field access operators in C: . and ->. The former is used to access fields of a structure, whereas the latter is used to access fields of a structure through a pointer to the structure. Now that lnump is defined to be a pointer to the SINGLE_FP structure, all variables declared to be of type SINGLE_FP can access the bit fields through the use of ->.

Observe structure->field is equivalent to (*structure).field. For example, lnump->exponent is equivalent to (*lnump).exponent

} /* end of BIG_INT exponent(float) */

BIG_INT mantissa(float num) {
  float *fnump = &num;
  struct SINGLE_FP *lnump = (struct SINGLE_FP *) fnump;  

} /* end of BIG_INT mantissa(float) */

float number(int argc, char *argv[]) { ... }

int main(int argc, char *argv[]) { ... }

Static Local Variables (Memoization) edit

Problem: Using memoization write a low cost recursive factorial function.

Reminiscent of caching memoization can be used to speed up programs by saving results of computations that have already been made. The difference between the two lies in their scopes. When we speak of caching we speak of a system- or application-wide optimization technique. On the other hand, memoization is a function-specific technique. When a request is received we first check to see whether we can avoid computing the result from scratch. Otherwise, computation is carried out from scratch and the result is added to our database. In other words, we prefer computation-in-space over computation-in-time and save some precious computer time.

Applying this technique to the problem at hand we will store the set of already computed factorials in a static local array. Now that any changes made to this array will be persistent between different calls, basis condition for recursion will be changed to reaching a computed factorial, not an argument value of 1 or 0. That is, we have

Mathematical definition of the factorial function

#include <stdio.h>

#define TRUE 1

unsigned long long factorial(unsigned char n) {

As soon as the program starts running the following initializations will have taken effect. As a matter of fact, since static local variables are allocated in the static data region initial values for them are present in the disk image of the executable.

Mark the initializer of the array used for storing the already made computations. Although it has MAX_COMPUTABLE_N components only two values are provided in the initializer. The remaining slots are filled with the default initial value 0. In other words, it is equivalent to

static unsigned long long computed_factorials[MAX_COMPUTABLE_N] = {1, 1, 0, 0, , .., 0};

Note that we could provide more initial values to avoid more of the initial cost.

static unsigned char largest_computed = 1;
static unsigned long long computed_factorials[MAX_COMPUTABLE_N] = {1, 1};

If we have already computed the factorial of a number that is equal to or greater than the current argument value, we retrieve this value and return it to the caller.

Note that receiver of the returned value can be either the main function or another invocation of the factorial function. In the second case, we do part of the computation and use some partial result from the previous computations.

  if (n <= largest_computed) return computed_factorials[n];

  printf("N: %d\t", n);

Once new values are computed it is stored in our array and the largest argument value for which factorial has been computed is appropriately updated to reflect this fact.

  computed_factorials[n] = n * factorial(n - 1);
  if (largest_computed < n) largest_computed = n;

  return computed_factorials[n];
} /* end of unsigned long long factorial(unsigned char) */

int main(void) {
  short n;

  printf("Enter a negative value for termination of the program...\n");
  do {
    printf("Enter an integer in [0..20]: ");

h preceding the conversion letter (d) specifies that input is expected to be a short int. Similarly, one can use l to specify a long int.

    scanf("%hd", &n);
    if (n < 0) break;
    if (n > 20) {
      printf("Value out of range!!! No computations done.\n");

Like break continue is used to alter the flow of control inside loops; it terminates the execution of the innermost enclosing while, do while, and for statements. In our case, control will be transferred to the beginning of the do-while statement.

    } /* end of if (n > 20) */
    printf("%d! is %Lu\n", n, factorial((unsigned char) n));
  } while (TRUE);

} /* end of int main(void) */
gcc –o MemoizedFactorial.exe Memoize.c↵
Enter a negative value for termination of the program
Enter an integer in [0..20]: 1
1! is 1
Enter an integer in [0..20]: 5
N:5 N:4 N:3 N:2 5! is 120
Enter an integer in [0..20]: 5
5! is 120
Enter an integer in [0..20]: 10
N:10 N:9 N:8 N:7 N:6 10! is 3628800
Enter an integer in [0..20]: -1

File Manipulation edit

Problem: Write a program that strips comments of a C program.

#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define BOOL char
#define FALSE 0
#define TRUE 1

#define MAX_LINE_LENGTH 500


When an identifier definition is qualified with const, it is taken to be immutable. Such an identifier cannot appear as an lvalue. This means we cannot declare an identifier to be constant and subsequently assign a value to it; a constant must be provided with a value at its point of declaration. In other words, a constant must be initialized.

Initializer of a global constant can contain nothing but subexpressions that can be evaluated at the compile-time. A local constant on the other hand can contain run-time values. For instance,

#include <stdio.h> #include <stdlib.h> void f(int ipar) { const int i = ipar * 3; printf(i: %d\n, i); } /* end of void f(int) */ int main(void) { int i = 6; f(5); f(10); f(i); f(i + 7); exit(0); } /* end of int main(void) */

will produce the following output:

i: 15
i: 30
i: 18
i: 39

This goes to show that constants are created for each function invocation and they can get different values. But they still cannot be modified throughout the function call.

const char file_ext[] = .noc;
const char temp_file[] = TempFile;
const char usage[] = "USAGE: StripComments filename";

In C, all identifiers must be declared before they are used! This includes file names, variables, and structure tags. The corresponding definition can be provided after the declaration in the same file or in a different one. While there can be more than one declaration, there can be only one definition.

Following declarations are prototypes of functions whose definitions are provided later in the program. Note the parameter names do not agree with the parameter names provided in the definition. As a matter of fact, one does not even have to provide the names. But, you still have to list the parameter types so as to facilitate type checking: It is an error if the definition of a function or any uses of it do not agree with its prototype.

The use of prototypes can be avoided by rearranging the order of definitions. For this example, putting the main function at the end would remove the need for the prototypes.

char* filename(int argumentCount, char* argumentVector[]);
void trimWS(const char *filename);
void strip(const char *filename);

Comma-separated expressions are considered a single expression result of which is the result returned by the last expression. Evaluation of a comma-separated expression is strictly from left to right; the compiler cannot change this order.

int main(int argc, char* argv[]) {
  char *fname = filename(argc, argv);
  (strip(fname), trimWS(fname));

} /* end of int main(int, char**) */

char* filename(int argc, char* argv[]) {

A more general version of printf, fprintf performs output formatting and sends the output to the stream specified as the first argument. We can therefore see printf as an equivalent form of the following:

fprintf(stdin, "...", ...); /* read it as "file printf..." */

Another friend of printf that you may want to consider in output formatting is the sprintf function. This function, instead of writing the output to some media, causes it to be stored in a string of characters.

typedef double currency; currency expenditure = 234.0; ... char *str = (char*) malloc(); ... sprintf(str, %f.2$, expenditure); printf(Total spending: %s, str); /* will print “Total spending: 234.00$” */ ... free(str); ...

  switch (argc) {
    case 1: 
      fprintf(stderr, "No file name passed!\n %s\n", usage);
    case 2: return(argv[1]);
      fprintf(stderr, "Too many arguments!\n %s\n",usage);
  } /* end of switch(argc) */
} /* end of char* filename(int, char**) */

void trimRight(char *line) {
  int i = strlen(line) - 1;

  while (i >= 0 && isspace(line[i])) ;

  line[i + 1] = '\n';
  line[i + 2] = '\0';
} /* end of void trimRight(char*) */

void trimWS(const char *infilename) {
  char next_line[MAX_LINE_LENGTH];
  char outfilename[strlen(infilename) + strlen(file_ext) + 1];
  FILE *infile, *outfile;
  BOOL empty_line = FALSE;
MS Visual C/C++ compiler will emit a compilation error for the highlighted line. However, according to the ISO C, automatic—that is, local—arrays may have an unknown size and this size is determined by the initializer in the run-time. In order for this program to compile using MS Visual C/C++, you need to change this array definition to a pointer definition and allocate/deallocate space for the array in the heap by using the malloc/free functions.

The following statement tries to open a file in reading mode. It maps the physical file, whose name is held in variable temp_file, to the logical file named infile. If this attempt results in success every operation you apply on the logical file will be executed on the physical file. infile variable can be thought of as a handle on the real file. The mapping between the handle and the physical file is not one-to-one. Just like more than one handle can show the same object, one can have more than one handle on the same physical file. There is no problem as long as all handles are used in read mode. But things get ugly if different handles simultaneously try to modify the same file. [The keywords are operating systems, concurrency, and synchronization.]

If the open operation fails we cannot get a handle on the physical file. That is reflected in the value returned by the fopen function: NULL. A pointer having a NULL value means that we cannot use it for further manipulation. All we can do is to test it against NULL. So, we check for this condition first. Unless it is NULL we proceed; otherwise we write something about the nature of the exceptional condition to the standard error file stderr and exit the program.

Just like the standard output, the standard error file is by default mapped to the screen. So, why do we write to stderr instead of stdio? The answer is, we may choose to re-map these standard files to different physical units. In such a case if we kept writing everything to the same logical file, say stdio, errors would clutter valid output data; we would have difficulty telling which one is which.

  infile = fopen(temp_file, "r");
  if (infile == NULL) {
    fprintf(stderr, "Error in opening file %s: %s\n", temp_file, strerror(errno));
  } /* end of if (infile == NULL) */

  strcpy(outfilename, infilename); strcat(outfilename, file_ext);
  outfile = fopen(outfilename, "w");
  if (outfile == NULL) {
    fprintf(stderr, "Error in opening file %s: %s\n", outfilename, strerror(errno));
  } /* end of if (outfile == NULL) */

  while (fgets(next_line, MAX_LINE_LENGTH + 1, infile)) {
    if (strlen(next_line) == 1) {
      if (!empty_line) fputs(next_line, outfile);
      empty_line = TRUE;
      } else {
        fputs(next_line, outfile);
        empty_line = FALSE;
      } /* end of else */
  } /* end of while (fgets(next_line, …) */

  fclose(infile); fclose(outfile); remove(temp_file);

} /* end of void trimWS(const char*) */

void strip(const char *filename) {
  int next_ch;
  BOOL inside_comment = FALSE;
  FILE *infile, *outfile;

  infile = fopen(filename, "r");
  if (infile == NULL) {
    fprintf(stderr, "Error in opening file %s: %s\n", filename, strerror(errno));
  } /* end of if (infile == NULL) */

  outfile = fopen(temp_file, "w");
  if (outfile == NULL) {
    fprintf(stderr, "Error in opening file %s: %s\n", temp_file, strerror(errno));
  } /* end of if (outfile == NULL) */

The solution to the problem can be modeled using the following finite automaton.

Finite automaton providing the solution

Note the ease of transforming the FA-based solution to C code. This is yet another example of how useful, however useless and boring it might look at first, such a theoretic model might be.

A problem is solved by transforming its problem domain representation to the corresponding solution domain representation. The more one knows about models to represent a problem the more easily she will come up with the solution of the problem.

  while ((next_ch = fgetc(infile)) != EOF) {
    switch (inside_comment) {
      case FALSE:
        if (next_ch != '/') { fputc(next_ch, outfile); break; }
        if ((next_ch = fgetc(infile)) == '*') inside_comment = TRUE;
          else { fputc('/', outfile); ungetc(next_ch, infile); }
      case TRUE:
        if (next_ch != '*') break; 
        if ((next_ch = fgetc(infile)) == '/') inside_comment = FALSE;
    } /* end of switch(inside_comment) */
  } /* end of while ((next_ch = fgetc(infile)) != EOF) */

fclose disconnects the logical file(handle) from the physical file. If not done by the programmer, code found in the exit sequence of the C runtime guarantees that all opened files are closed at the end of the program. Leaving it to the exit sequence has two disadvantages, though.

  1. There is a maximum number of files that an application can have open at the same time. If we defer closing of the files to the exit sequence, we may reach this limit more frequently and quickly.
  2. Unless you explicitly flush using fflush, data you write to a file is actually written to a buffer in memory, not to disk. It is flushed automatically whenever you write a newline to the file or the buffer area becomes full. It means that if there occurs a catastrophic failure, such as an outage, between the earliest time you could close the file and the time exit sequence closes it at the end of the program, the data left in the buffer area will not have been committed to the disk. Not a perfect success story!

So you should either flush explicitly or close the file as early as possible.


} /* end of void strip(const char *) */

Heap Memory Allocation edit

Problem: Write an encryption program that reads from a file and writes the encoded characters to some other file. Use this simple encryption scheme: The encrypted form of a character c is c ^ key[i], where key is a string passed as a command line argument. The program uses the characters in key in a cyclic manner until all the input has been read. Re-encrypting encoded text with the same key produces the original text. If no key or a null string is passed, then no encryption is done.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define TOO_FEW_ARGS 21
#define TOO_MANY_ARGS 22

const char *usage = "USAGE: CyclicEncryption inputfilename [key [outputfilename]]";
const char file_ext[] = ".enc";

struct FILES_AND_KEY {
  char *infname;
  char *outfname;
  char *key;

typedef struct FILES_AND_KEY f_and_k;

f_and_k getfilenameandkey(int argc, char **argv) { 
  f_and_k retValue = { "\0", "\0", "\0" };

  switch (argc) {
    case 1:
      fprintf(stderr, "No file name passed!\n %s", usage);

malloc is used to allocate storage from the heap, the area of memory that is managed by the programmer herself. This implies that it is the responsibility of the programmer to return every byte of memory allocated through malloc to the pool of available memory.

Such memory is indirectly manipulated through the medium of a pointer. Different pointers can point to the same region of memory. In other words, they can share the same object.

Using pointers to manipulate heap objects does not mean that pointers can point only to heap objects. Nor does it mean that pointers themselves are allocated in the heap. Pointers can point to non-heap objects and they can reside in the static area or the run-time stack. Indeed, such was the case in the previous examples.

    case 2:
      retValue.infname = (char *) malloc(strlen(argv[1]) + 1);
      strcpy(retValue.infname, argv[1]);
      retValue.outfname = (char *) malloc(
      strlen(argv[1]) + strlen(file_ext) + 1);
      strcpy(retValue.outfname, argv[1]);
      strcat(retValue.outfname, file_ext);
    case 3:
      retValue.infname = (char *) malloc(strlen(argv[1]) + 1);
      strcpy(retValue.infname, argv[1]);
      retValue.key= (char *) malloc(strlen(argv[2]) + 1);
      strcpy(retValue.key, argv[2]);
      retValue.outfname = (char *) malloc(
      strlen(argv[1]) + strlen(file_ext) + 1);
      strcpy(retValue.outfname, argv[1]);
      strcat(retValue.outfname, file_ext);
    case 4:
      retValue.infname = (char *) malloc(strlen(argv[1]) + 1);
      strcpy(retValue.infname, argv[1]);
      retValue.key = (char *) malloc(strlen(argv[2]) + 1);
      strcpy(retValue.key, argv[2]);
      retValue.outfname = (char *) malloc(strlen(argv[3]) + 1);
      strcpy(retValue.outfname, argv[3]);
      fprintf(stderr, "Too many arguments!\n %s", usage);
  } /* end of switch(argc) */

  return retValue;
} /* end of f_and_k getfilenameandkey(int, char**) */

void encrypt(f_and_k fileandkey) {
  int i = 0, keylength = strlen(fileandkey.key);
  int next_ch;
  FILE *infile;
  FILE *outfile;

  if (keylength == 0) return;

The following line opens a file that can be read byte-by-byte (binary mode), not character-by-character (text mode). In an operating system like UNIX, where each and every character is mapped to a single entry in the code table, there is no difference between the two. But in MS Windows, where newline is mapped to carriage return followed by linefeed, there is a difference, which may escape incautious programmers. The following C program demonstrates this. Run it with a multi-lined file and you will see that the number of fgetc's needed will be different. The number of characters you read will be less than the number of bytes you read. This discrepency, which is a consequence of the difference in the treatment of the newline character, is likely to cause a great headache when you move your working code from Linux to MS Windows.


#include <errno.h> #include <stdio.h> #include <string.h> #define CANNOT_OPEN_INPUT_FILE 1 int main(void) { int counter; int next_ch; FILE *in1, *in2; in1 = fopen("Text.txt", "r"); if (!in1) { fprintf(stderr, "Error in opening file! %s\n", strerror(errno)); return(CANNOT_OPEN_INPUT_FILE); } /* end of if(!in1) */ counter = 1; while((next_ch = fgetc(in1)) != EOF) printf(" %d\t", counter++, next_ch); printf("\n"); fclose(in1); in2 = fopen("Text.txt", "rb"); if (!in2) { fprintf(stderr, "Error in opening file! %s\n", strerror(errno)); return(CANNOT_OPEN_INPUT_FILE); } /* end of if(!in2) */ counter = 1; while((next_ch = fgetc(in2)) != EOF) printf(" %d\t", counter++, next_ch); fclose(in2); return(0); } /* end of int main(void) */


First line Second line Third Line Fourth and last line

If you compile and run the program listed on the previous page in a Unix-based environment, it will produce identical outputs for both (text and binary) modes. For MS Windows and DOS environments they will produce the following output.

gcc –o Newline Newline.c↵
Newline↵ 70 ...
... 10 83 ...
... 101 70 ...
... 13 10 83 ...
... 101
  infile = fopen(fileandkey.infname, "rb");
  if (!infile) {
    fprintf(stderr, "Error in opening file %s: %s\n",
    fileandkey.infname, strerror(errno));
  } /* end of if (!infile) */

  outfile = fopen(fileandkey.outfname, "wb");
  if (!outfile) {
    fprintf(stderr, "Error in opening output file %s: %s\n", 
    fileandkey.outfname, strerror(errno));
  } /* end of if (!outfile) */

  while ((next_ch = fgetc(infile)) != EOF) {
    fprintf(outfile, "%c",(char) (next_ch ^ fileandkey.key[i++]));
    if (keylength == i) i = 0;
  } /* end of while ((next_ch = fgetc(infile)) != EOF) */

  fclose(outfile); fclose(infile);

} /* end of void encrypt(f_and_k) */

int main(int argc, char *argv[]) {
  f_and_k fandk = getfilenameandkey(argc, argv);

  fandk.infname = fandk.outfname; 
  fandk.outfname = (char *) 
  malloc(strlen(fandk.infname) + strlen(file_ext) + 1);
  strcpy(fandk.outfname, fandk.infname);
  strcat(fandk.outfname, file_ext);

} /* end of int main(int, char **) */

Pointer to Function (Callback) edit

Problem: Write a generic bubble sort routine. Test your code to sort an array of ints and character strings.

#ifndef GENERAL_H
#define GENERAL_H


#define BOOL char
#define FALSE 0
#define TRUE 1
typedef void* Object;

The following typedef statement defines COMPARISON_FUNC to be a pointer to a function that takes two arguments of Object (that is void*) type and returns an int. After this definition, in this header file or any file that includes this header file, we can use COMPARISON_FUNC just like any other data type. That actually is what we do in Bubble_Sort.h and Bubble_Sort.c. Third argument of the bubble_sort function is defined to be of type COMPARISON_FUNC. That is, we can pass address of any function conforming to the prototype stated below.

void* is used as a generic pointer. In other words, data pointed to by the pointer is not assumed to belong to a particular type. In a sense, it serves the same purpose the Object class at the top of the Java class hierarchy does. Just as any object can be treated to belong in the Object class, any value, be that something as simple as a char or something as complex as a database, can be pointed to by this pointer. But, such a pointer cannot be dereferenced with the * or subscripting operators. Before using it one must first cast the pointer to an appropriate type.

typedef int (*COMPARISON_FUNC)(Object, Object);


#include "General.h"

bubble_sort function sorts an array of Objects (first argument), whose size is provided as the second argument. Now that there is no universal way of comparing two components and we want our implementation to be generic, we must be able to dynamically determine the function that compares two items of any type. Dynamic determination of the function is made possible by using a pointer to a function. Assigning different values to this pointer enables using different functions, which means different behavior for different types. Coming up with parameter types that are common to all possible data types is the key to making this pointer to function work for all. For this reason, comp_func takes two arguments of type Object, that is void*, which can be interpreted to belong to any type.

extern void bubble_sort (Object arr[], int sz, COMPARISON_FUNC comp_func);


#include <stdio.h>

#include "General.h"
#include "algorithms\sorting\Bubble_Sort.h"

The static qualifier in the following definition limits the visibility of swap to this file. In a sense, it is what private qualifier is to a class in OOPL’s. The difference lies in the fact that a file is an operating system concept (that is, it is managed by OS), while a class is a programming language one (that is, it is managed by the compiler). The latter is certainly a higher-level abstraction. In the absence of a higher-level abstraction, it (higher-level abstraction) can be simulated using lower level one(s). Intervention of an external agent is required to provide this simulation, which gives rise to a more fragile solution. Such is the case in this solution: the programmer, the intervening agent, has to simulate this higher-level of abstraction through using some programming conventions.

static void swap(Object arr[], int lhs_index, int rhs_index) {
  Object temp = arr[lhs_index];
  arr[lhs_index] = arr[rhs_index];
  arr[rhs_index] = temp;

} /* end of void swap(Object[], int, int) */

void bubble_sort (Object arr[], int sz, COMPARISON_FUNC cmp) {
  int pass, j;

  for(pass = 1; pass < sz; pass++) {
    BOOL swapped = FALSE;

    for (j = 0; j < sz - pass; j++)

We could have written the following line as follows:

if (cmp(arr[j], arr[j + 1]) > 0) {

and it would still do the same thing. So, a pointer-to-function variable can be used just like a function: simply use its name and enclose the arguments in parentheses. This will cause the function pointed to by the variable to be called. Nice thing about calling a function through a pointer is the fact that you can call different functions by simply changing the value of the pointer variable. If you pass the address of compare_ints to the bubble_sort function, it will call compare_ints; if you pass the address of compare_strings, it will call compare_strings; if ...

      if ((*cmp)(arr[j], arr[j + 1]) > 0) {
        swap(arr, j, j + 1);
        swapped = TRUE;
      } /* end of if ((*cmp)(arr[j], arr[j + 1]) > 0) */

    if (!swapped) return; 
  } /* end of outer for loop */
} /* void bubble_sort(Object[], int, COMPARISON_FUNC) */

#include <stdio.h>
#include <string.h>

#include “General.h”

The following line brings in the prototype of the bubble_sort function, not the source code or the object code for it.

The compiler utilizes this prototype in type-checking the function use. This involves checking the number of parameters, their types, whether it [the function] is used in the right context. As soon as this is confirmed, the linker takes over and [if it is in a separate source file] brings in the object code for bubble_sort. So,

  1. Preprocessor preprocesses the source code by replacing directives with C source code.
  2. Compiler, using the provided meta-information (such things as variable declarations/definitions and function prototypes), checks the syntax and semantics of the program. Note that by the time the compiler takes over, all preprocessor directives will have been replaced with C source. That is, the compiler does not know anything about the preprocessor.
  3. Linker combines object files to a single executable file. This executable is later loaded into memory by the loader and run under the supervision of the operating system.
#include "algorithms\sorting\Bubble_Sort.h"

typedef int* Integer;

void print_args(char *argv[], int sz) {
  int i = 0;

  if (sz == 0) { 
    printf("No command line args passed!!! Unable to test strings...\n");
  for (; i < sz; i++)
    printf("Arg#%d: %s\t", i + 1, argv[i]);
} /* end of void print_args(char **, int) */

void print_ints(Integer seq[], int sz) {
  int i = 0;

  for (; i < sz; i++) 
  printf("Item#%d: %d\t", i + 1, *(seq[i]));
} /* end of void print_ints(Integer[], int) */

Next two functions are needed for making comparisons in the implementation of the sorting algorithm. They compare two values of the same type.

Implementer of the bubble_sort function cannot know in advance the countless number of object types to be sorted and there is no universal way of comparing that works for all types. So, user of the sorting algorithm must implement the comparison function and let the implementer know about this function. Implementer makes use of this function to successfully provide the service. In doing this it calls “back” to the user’s code. This type of a call is therefore named a callback.

int compare_strings(Object lhs, Object rhs) {
  return(strcmp((char *) lhs, (char *) rhs));
} /* end of int compare_strings(Object, Object) */

Looks like a very complicated way of comparing two ints? That’s right! But remember: we have to be able to compare two objects (values) of any type and we must do it with one single function signature. Comparison of two ints is straightforward: just compare the values. But, how about character strings? Comparing the pointers will not yield an accurate result; we must compare the character strings pointed to by the pointers. It gets even more complex as we compare two structures that have nested structures in themselves. The solution to this is to pass the ball to the person who knows it best (the user of the algorithm) and while doing so pass a generic pointer (void *) to the data, not the data itself. In the process, the user will cast it into an appropriate type and make the comparison accordingly.

int compare_ints(Object lhs, Object rhs) {
  Integer ip1 = (Integer) lhs;
  Integer ip2 = (Integer) rhs;

  if (*ip1 > *ip2) return 1;
  else if (*ip1 < *ip2) return -1;
    else return 0;
} /* end of int compare_ints(Object, Object) */

int main(int argc, char *argv[]) {
  int seq[] = {1, 3, 102, 6, 34, 12, 35}, i;
  Integer int_seq[7];

  for(i = 0; i< 7; i++) int_seq[i] = &seq[i];

  printf("\nTESTING FOR INTEGERS\n\nBefore Sorting\n");
  print_ints(int_seq, 7);

The third argument passed to bubble_sort function is a pointer-to-function. This pointer contains a value that can be interpreted as the entry point of a function. Conceptually, there is not much of a difference between such a pointer and pointer to some data type. The only difference is in the memory region they point to. The latter points to some address in the data segment while the former to some address in the code segment. But one thing remains the same: the address value is always interpreted [in a particular way].

Note that you do not have to apply the address-of operator when you send a function as an argument.

  bubble_sort((Object*) int_seq, 7, &compare_ints);
  printf("\nAfter Sorting\n");	print_ints(int_seq, 7);

  printf("\nTESTING FOR STRINGS\n\nBefore Sorting\n");
  print_args(&argv[1], argc - 1);
  bubble_sort((Object*) &argv[1], argc - 1, compare_strings);
  printf("After Sorting\n");	print_args(&argv[1], argc - 1);

} /* end of int main(int, char **) */

Linking to an Object File edit

cl /c /ID:\include Bubble_Sort.c↵

/I adds D:\include to the head of the list of directories to be searched for header files, which initially include the directory of the source file and the directories found in %INCLUDE%. Analogous to CLASSPATH of Java, it is used in organizing header files. Using this information, the preprocessor brings D:\include\algorithms\sorting\Bubble_Sort.h into the current file (Bubble_Sort.c). Once the preprocessor is through with its job, compiler will try to compile the resulting file and output an object file, named Bubble_Sort.obj.

cl /FeTest_Sort.exe /ID:\include Bubble_Sort.obj Pointer2Function.c↵

The above command compiles Pointer2Function.c as was explained in the previous paragraph. Once Pointer2Function.obj is created it is linked with Bubble_Sort.obj to form the executable named Test_Sort.exe. This linking is required to bring in the object code of bubble_sort function. Remember: inclusion of Bubble_Sort.h brought in the prototype, not the object code!

Modular Programming edit

Problem: Write a (pseudo) random number generator in C. It is guaranteed that only one generator is used by a particular application and different applications will be using it over and over again.

Now that our generator will be used by different applications, we had better put it in a separate file of its own so that, by linking with the client program, we can use it from different sources. This is similar to, if not same with, the notion of module supported in languages like Modula-2. The difference has to do with their abstraction levels: the module is an entity provided by the programming language and known to all of its users, while file is an entity provided by the operating system and known to all of its users. Programming language compiler (that is, an implementation of the programming language specification) being a user of OS-provided services and concepts means module concept is a higher abstraction.

Now that the module concept (higher abstraction) is not present in C, we need to simulate it using other, probably lower-level, abstraction(s). In this case, we use a file (lower abstraction). By doing so, we cannot regain all of what comes with a module, though. The notion of module remains unknown to the compiler; certain rules reinforced by the compiler in a modular programming language must be checked by the programmer herself/ himself. For example, interface and implementation of a module must be synchronized by the programmer, which is certainly an error-prone process.

None of the applications being expected to use more than one generator means that we can create the data fields related to the generator in the static data region; in order to identify which generator the function is acting on, we do not need to pass a separate, unique handle. This implies we do not need any creation or destruction functions. All we need to have is a function for initializing the generator and another for returning the next (pseudo) random number.

#ifndef RNGMOD_H
#define RNGMOD_H

extern void RNGMod_InitializeSeed(long int);
extern double RNGMod_Next(void);


#include "math\RNGMod.h"

static long int seed = 314159;
const int a = 16807;
const long int m = 2147483647;

Generating a (pseudo) random number is similar to iterating through a list of values: we basically start from some point and move through the values one by one. The difference lies in the fact that the values generated are to be computed using a function rather than retrieved from some memory location. In other words, a generator iterates through a list using computation in time while an iterator does this using computation in space.

Seen from this perspective, initializing a (pseudo) random number generator with a seed is analogous to creating an iterator over a list. By using different values for the seed we can guarantee the generator returns a different value. Using the same seed value will give the same sequence of (pseudo) random values. Such a use may be wanted for replaying the same simulation scenarios for different parameters.

void RNGMod_InitializeSeed(long int s) {
  seed = s;

} /* end of void RNGMod_InitializeSeed(long int) */

The following function computes the next number in the sequence. The number being computed using a well-known formula means that the sequence is actually not random. That is, it can be known beforehand. This is why such a number is usually qualified with the word ‘pseudo’.

What makes the sequence look random is the number of different values it produces in a row. Formula used in the following function will produce all values between 1 and m – 1 and then repeat the sequence. More than two billion values!

In practice, it doesn’t make sense to use these large numbers. We therefore choose // to normalize the value into [0..1).

double RNGMod_Next(void) {
  long int gamma, q = m / a;
  int r = m % a;

  gamma = a * (seed % q) - r * (seed / q);
  if (gamma > 0) seed = gamma;
    else seed = gamma + m;

  return((double) seed / m);
} /* end of double RNGMod_Next(void) */

#include <stdio.h>

#include "math\RNGMod.h"

int main(void) {
printf("TESTING RNGMod…\n");
printf("Before initialization: %g\n", RNGMod_Next());
printf("After initialization: %g\t", RNGMod_Next());
printf("%g\t", RNGMod_Next());
printf("%g\t", RNGMod_Next());
printf("%g\n", RNGMod_Next());

} /* end of int main(void) */

Building a Program Using make edit

As the number of files needed to compile/link a program increases, it becomes more and more difficult to track the interdependency between the files. One tool to tackle with such situations is the make utility found in UNIX environments. This utility automatically determines which pieces of a large program need to be recompiled and issues commands to recompile them.

Input to make is a file consisting of rules telling which files are dependent on which files. These rules generally take the following shape:

target : prerequisites
TAB command
TAB command
TAB ...

The preceding rule is interpreted as follows: if the target is out of date use the following commands to bring it up to date. A target is out of date if it does not exist or if it is older than any of the prerequisites (by comparison of last-modification times).


The following rule tells the make utility that RNGMod_Test.exe depends on RNGMod.o and RNGMod_Test.c. RNGMod_Test.exe is out of date if it does not exist or if it is older than RNGMod.o and RNGMod_Test.c. If any one of these two files is modified we have to update RNGMod_Test.exe by means of the command supplied in the next line.

Note the tab preceding the command is not there to make the file more readable; commands used to update the target must be preceded by a tab.

$@ is a special variable used to denote the target filename.

RNGMod_Test.exe : RNGMod.o RNGMod_Test.c
  gcc -o $@ -ID:\include RNGMod.o RNGMod_Test.c

RNGMod.o : RNGMod.c D:\include\math\RNGMod.h
  gcc -c -ID:\include RNGMod.c

.PHONY is a predefined target used to define fake goals. It ensures the make utility that the target is not actually a file to be updated but rather an entry point. In this example, we use clean as an entry point enabling us to delete all relevant object files in the current directory.

.PHONY : clean
  del RNGMod.o RNGMod_Test.exe

As soon as we save this file, all we have to do is to issue the make command. This command will look in the current directory for a file named makefile or Makefile. If you are using GNU version of make, GNUmakefile will also be tried. Once it finds such a file, make tries to update the target file of the first rule from the top.

gcc –c –ID:\include RNGMod.c
gcc –o RNGMod_Test.exe –ID:\include RNGMod.o RNGMod_Test.c
Time: 2.263 seconds
Testing RNGMod...
Before initialization: 0.458724
After initialization: 0.273923 0.822585 0.186277 0.754617

Note that the time it takes the make utility to complete its task may vary depending on the processor speed and its load. In case only RNGMod_Test.c may have been modified we will see the following output.

gcc –o RNGMod_Test.exe –ID:\include RNGMod.o RNGMod_Test.c
Time: 1.673 seconds

We may at times want the make utility to start from some other target. In such a case we have to provide the name of the target as a command line argument. For instance, if we need to delete the relevant object files found in the current directory, issuing the following command will do the job.

make clean↵
del RNGMod.o RNGMod_Test.exe
Time: 0.171 seconds

Above presentation is a rather limited introduction to the make utility. For more, see Manual of GNU make utility.

Object-Based Programming (Data Abstraction) edit

Problem: Write a (pseudo) random number generator in C. Your solution should enable a single application to use more than one generator (with no upper bound to this number) at the same time. We should also meet the requirement of the generator being used from different applications.

As was mentioned in the [#Modular Programmingmodular|programming section], second requirement can be met by providing the generator in a separate file of its own.

In order to enable the use of more than one generator, where the exact number is not known, we must devise a method to create the generators dynamically as needed (similar to what constructors do in OOPL’s). This creation scheme must also give us something to uniquely identify each generator (similar to handles returned by constructors). We should also be able to return generators that are not needed anymore (similar to what destructors do in OOPL’s without automatic garbage collection).

#ifndef RNGDA_H
#define RNGDA_H

As stated in the requirements we have to come up with a scheme that lets users have more than one generator coexisting in the same program. We should in some way be able to identify each and every one of these generators and tell it from the others. This means that we cannot utilize the same method we used in the previous example. We have to get related functions to behave differently depending on the state of the particular generator. This difference in behavior can be achieved by passing an extra argument, a handle on the present state of the generator. This handle should hide the implementation details of the generator from the user; it should have immutable characteristics that we can use to hide the mutable properties of the generator. Sounds like the notion of handle in Java? That’s right! Unfortunately, C does not have direct support for handles. We need some other, probably less abstract, notion to simulate it. This less abstract notion we will use is the notion of pointer: whatever the size of the data it is a handle on, its size does not change.

In the following lines we first make a forward declaration of a struct named _RNG and then define a new type as a pointer to this not-yet-defined struct. By using forward declarations, we do not betray any of the implementation details; we just let the compiler know that we have the intention of using such a type. By defining a pointer to this type, we put a barrier between the user and the implementer: the user has something (pointer, something size of which does not change) to access a generator (underlying object, the size of which can be changed by implementation decisions) indirectly. Freedom of change means the generator can be used without any reference to the underlying object, which is why we call this approach data abstraction any type defined in such a way abstract data type. Note the pointer argument passed to the function. Function behavior changes depending on the underlying object, which is indirectly accessed by means of this pointer. That’s why this style of programming is called object-based programming.

struct _RNG;
typedef struct _RNG* RNG;

extern RNG RNGDA_Create(long int);
extern void RNGDA_Destroy(RNG);
extern double RNGDA_Next(RNG);


#include <stdio.h>
#include <stdlib.h>

#include "math\RNGDA.h"

struct _RNG { long int seed; };

const int a = 16807;
const long int m = 2147483647;

RNG RNGDA_Create(long int s) {
  RNG newRNG = (RNG) malloc(sizeof(struct _RNG));
  if (!newRNG) {
    fprintf(stderr, "Out of memory...\n");
  newRNG->seed = s;

} /* end of RNG RNGDA_Create(long int) */

void RNGDA_Destroy(RNG rng) { free(rng); }

double RNGDA_Next(RNG rng) {
  long int gamma;
  long int q = m / a;
  int r = m % a;

  gamma = a * (rng->seed % q) - r * (rng->seed / q);
  if (gamma > 0) rng->seed = gamma;
    else rng->seed = gamma + m;

  return((double) rng->seed / m);
} /* end of double RNGDA_Next(RNG) */

#include <stdio.h>
#include "math\RNGDA.h"

int main(void) {
  int i;
  RNG rng[3]; 

  printf("TESTING RNGDA\n");
  rng[0] = RNGDA_Create(1245L);
  rng[1] = RNGDA_Create(1245L);
  rng[2] = RNGDA_Create(2345L);
  for (i = 0; i < 5; i++) {
    printf("1st RNG, %d number: %f\n", i, RNGDA_Next(rng[0]));
    printf("2nd RNG, %d number: %f\n", i, RNGDA_Next(rng[1]));
    printf("3rd RNG, %d number: %f\n", i, RNGDA_Next(rng[2]));
  } /* end of for (i = 0; i < 5; i++)*/

} /* end of int main(void) */

Notes edit

  1. ISO C permits whitespace to precede and follow the # character on the same source line, but older compilers do not.
  2. As we shall later see, void * is an exception to this.
  3. In Java, there are two right-shift operators: >> and >>>: While the former sign-extends its operand the latter does its job by zero-extending.
  4. This should not lead you to think that these phases cannot take place concurrently. What it tells is that phases must start in a certain order. For instance, once the preliminary design is ready, implementers can start implementation, but not before. But both teams must be in contact with each other and provide feedback. As designers make changes these are relayed to the implementation team; as problems surface in the implementation these are passed back to the design team. Note that same sort of relationship may be present between other teams in the software production cycle.