Programming Language Concepts Using C and C++/Dynamic Memory Management

Mathematicians, when asked to define a concept, do not tend to base their answers on limitations of daily life. A set is simply a collection of distinct elements and does not impose an implied ordering among them. An empty set is as “real” as a set with three or five elements. One can add an indefinite number of elements to a set. As a matter of fact, one can speak of infinite sets.

Unfortunately, finite nature of computers does not afford computer scientists the luxury of making such wild claims. You just cannot assume your data structures to have infinite sizes. Even worse, concerns about time and/ or space performance generally force you to put on the straitjacket of static data structures: your data structures cannot have a size greater than some predefined value. But, what if you make a poor forecast? Simple: your program crashes; you rebuild your program with a different value and cross your fingers, hoping it will not happen again. Or, in a self-defeating manner, your program allocates memory that’s never used.

Picture is not so grim, though. You have an alternative: dynamic memory management. Using dynamic memory management functions, one can grow and shrink the size of data structures dynamically as needed. These functions do their job by affecting a region of memory called the heap or free store.

Growing a data structure is done by means of a function such as malloc or new. These functions basically demand memory from a part of the language runtime, called memory allocator. If it can satisfy the request memory allocator returns a pointer (or handle) to the allocated region. Otherwise, a special value (such as NULL or nil) or an exception object is returned, indicating failure to fulfill the allocation demand.

Shrinking the data structure is more interesting. While some programming languages provide functions to explicitly release memory, some take responsibility and do it on behalf of the programmer. The latter group is said to have [automatic] garbage collection ([automatic] GC). Memory allocators of the programming languages belonging in the first group expect programmers to take initiative by calling functions such as free or delete. This is bad news for sloppy programmers. Not releasing unused memory is equal to wasting memory and will eventually give rise to crashes; as more and more memory is allocated and none is reclaimed, the pool of usable memory is shrunk and the allocator finds itself in a situation where it cannot satisfy an allocation request, which inevitably leads to the aforementioned result. Such unused memory is referred to as garbage. Converse of this, early releasing of memory, is equally disastrous. You are now faced with the danger of using a pointer that does not point to valid data. Such a pointer is called a dangling pointer.

So, why do people- at least some- keep using languages without garbage collection? Or, cannot they simply incorporate this facility into such a language as C? Answer to the first question is performance. Executing as a background thread (or process), garbage collector will take over either when all other threads are in a wait state or the program is out of memory and next allocation demand can only be satisfied by some garbage that is yet to be reclaimed.[1] This is a time-consuming process and won’t complete in a few machine cycles; as part of its execution, garbage collector will sweep through the entire heap memory and mark unused regions as garbage. This means invocation of the garbage collector will give rise to substantial drops in performance. On the other hand, explicit de-allocation by means of functions such as free or delete enable the programmer to return memory as it becomes unused. Although it is still an option there is no designated phase in the program where cycles are exclusively spent for returning unused memory in a wholesale fashion. This is depicted in the figure below.

Comparative performances of GC systems

As for the second question, the answer is negative. Being able to refer to the same memory region through different pointers, liberal use of casting in C makes it impossible for the allocator to figure out the contents of the memory. What if this region initially had some other pointer in it and now seen as a collection of raw bytes? So tracking down the pointer through memory to find unused regions and therefore garbage collection becomes impossible.[2]

Module

edit

In this example, we provide an implementation for character strings as an opaque type. On top of the previous presentation (Object-Based Programming), two levels of pointers in the representation of this type present us with the chance to emphasize correct allocation and de-allocation orders.

Interface

edit
String.h
#ifndef STRING_H
#define STRING_H

#define SUBSTR_NOT_FOUND -1
#define OUT_OF_MEMORY –2

Yet another forward declaration and its companion typedef! The former is used to forewarn the compiler about the existence of the record type that we will be working on and the latter provides a handle on an instance of such a record type. A type manipulated this way is called an opaque type; access to its instances is mediated by a pointer.

Observe that all formal parameters provided in the prototypes are '[constant] pointers to struct _STR', not 'struct _STR'. The reasoning for this is as follows: as a natural consequence of the information hiding principle, we should keep the record type representation as an implementation detail: the user of a facility need not concern herself about a detailed answer to the question "how" (implementation); she had better concentrate on finding an answer to the question "what" (interface). This can be accomplished by giving her things that do not change; things that may change should be made part of the implementation, not the interface. And size of pointer to a record type remains the same although size of the record type itself may change.

struct _STR;
typedef struct _STR *String;

extern String String_Create(const char*);
extern void String_Destroy(String *this);
extern int String_Compare(const String this, const String);
extern String String_Concat(const String this, const String);
extern int String_Contains(const String this, const String);
extern int String_Containsr(const String this, const String);
extern unsigned int String_Length(const String this);
extern String String_Substring(const String this, unsigned int start, unsigned int length);
extern char* String_Tocharp(const String this);

#endif

Implementation

edit
String.c
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "General.h"
#include "utility/String.h"

In addition to the string [of characters] our representation provides the length of the string as a member field. By doing this we trade space for time: when asked the length of a string, instead of going through all the characters till we see the terminating NULL character, we return the value of this member field. In other words, we prefer computation in space to computation in time.

struct _STR {
  unsigned int length;
  char *str;
};

static String String__Reverse(const String this);

String String_Create(const char* source) {
  String this;
  unsigned int i;

In addition to the static data area, where data that remain alive throughout the program execution reside, and the runtime stack, where variables local to blocks live; we are provided with a pool of memory we can use during program execution. This memory region is called the program’s free store or heap. Unlike the other two areas, this area is managed by the programmer by means of dynamic memory management functions.

One aspect of free store memory is that it is unnamed. Objects allocated on the free store are manipulated indirectly through pointers. A second aspect of the free store is that the allocated memory is un-initialized.

Allocation of memory at run time is referred to as dynamic memory allocation; memory allocated by means of a function such as malloc is said to be allocated dynamically. However, this does not mean that storage for the pointer itself is allocated dynamically. It might well be the case that it is allocated statically.[3]

Definition: An object’s lifetime–that period of time during program execution when storage is bound to the object–is referred to as an object’s extent.

Variables defined at file scope are spoken of as having static extent. Storage is allocated before program start-up and remains bound to the variable throughout program execution. Variables defined at local scope are spoken of as having local extent. Storage is allocated on the run-time stack on entry to the local scope; on exit, this storage is freed. A static local variable on the other hand exhibits static extent.

Objects allocated on the free store are spoken of as having dynamic extent. Storage allocated through the use of functions like malloc in C remains bound to an object until explicitly deallocated by the programmer.

malloc returns a generic pointer, void*<>/source. Such a pointer cannot be dereferenced with the <syntaxhighlight lang="c" enclose="none">* or the subscripting operators. In order to use this pointer to productive ends we must cast the returned value to some particular pointer type. Assuming malloc can satisfy the request, local variable this contains an address value that is to be interpreted to indicate the starting address of a struct _STR object. Otherwise, it contains a polymorphic value, NULL, that signifies the failure of malloc.

Question:
Provided that the scheme presented in the Data-Level Structure chapter is used, what do you think is the alignment required for the region allocated by malloc?
  this = (String) malloc(sizeof(struct _STR));

In case there may not be enough memory left in the heap, NULL will be returned by the malloc function. We should let the user know about it.

Upon successful allocation of memory for the metadata, which is a combination of the pointer to the string contents and its length, we must allocate room for the string contents. We do this by using the malloc function for a second time. Eventually, if everything goes right, we have the figure given below.[4]

  if (this == NULL) {
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if(this == NULL) */

  this->length = strlen(source);
  this->str = (char *) malloc(this->length + 1);
  if (this->str == NULL) {
  free(this);
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (this->str == NULL) */

Next loop and the two lines following it are used to copy a string of characters into another memory region. We first, in the for loop, copy as many characters as the length of the string and then append the terminating NULL character. Finally, now that we have moved the pointer in the process and it points to the NULL character at the end of the string, we get it to point to the beginning of it using pointer arithmetic on the last line.

Note that we could have done this using the standard string function strcpy. Along with strcpy, we are given an assortment of string functions. In addition to these, since a string of characters is a memory region ending with the sentinel value of ‘\0’, we can deal with such an entity using the appropriate memory function. A partial list of such functions is provided below.

Friends of strcpy
char* strcat(char *dest, const char *src) Appends src to dest and returns the value accumulated in dest as its result.
char* strncat(char *dest, const char *src, size_t len) Same as strcat with the extra condition of limiting the accumulated string's length to len.
int strcmp(const char *s1, const char *s2) Returns the result of comparing its arguments lexicographically. Return value is 0 if both strings are equal. If s1 comes before s2, returnes value is less than 0. Otherwise, a positive value is returned.
int strncmp(const char *s1, const char *s2, size_t len) Same as strcmp with the extra condition of limiting the comparison to the first len characters of the argument strings.
char* strcpy(char *dest, const char *src) Copies src into dest and returns the final value of dest as its result.
char* strncpy(char *dest, const char *src, size_t len) Same as strncpy with the extra condition of limiting the number of copied characters to len.
size_t strlen(const char *s) Returns the length of its argument.
char* strchr(const char* s, int c): Searches the string s for the first occurrence of c. If successful a pointer to this location is returned. Otherwise, a null pointer is returned.
char* strrchr(const char* s, int c): Searches the string s in the reverse direction for the first occurrence of c. That is, it finds the last c in s. One point to keep in mind: strchr and strrchr consider the terminating null character to be a part of s for the purposes of the search.
char* strstr(const char* str, const char* sub): Locates the first occurrence of sub in str and returns a pointer to the beginning of this occurrence. If sub does not occur in str, a null pointer is returned.
int memcmp(const void *ptr1, const void *ptr2, size_t len) /* tokenizes str using the characters in set as separators */
void* memcpy(void *dest, const void *src, size_t len) /* tokenizes str using the characters in set as separators */
void* memchr(const void *ptr, int val, size_t len ) /* tokenizes str using the characters in set as separators */
  for (i = 0; i < this->length; i++) 
    *this->str++ = *source++;
  *this->str = '\0';
  this->str -= this->length;

  return(this);
} /* end of String String_Create(const char*) */

It is now time to release the area of memory pointed to by *this and return it to the free store for re-use. In C, this can be done through a couple of functions, one of which is free.

Our destructor is admittedly very simple. It turns out that heap memory is the only resource used by the String data type. It may not be the case for other data types. They may hold handles to resources like files, semaphores, and so on.

Given the definition of struct _STR, we have the following memory layout:


*this is what is called a handle. The use of a pointer (struct _STR *), in other words indirection, is needed to isolate the user from probable changes in the representation. Even when the implementer changes the underlying representation, the user will not be affected.[5] Because she does not have direct access to the representation. All she has is a ‘handle on a String object’, not the String object itself.

The region pointed to by a pointer should be deallocated only when everything pointed to by the pointers in the region have been deallocated. That’s why we should first deallocate (*this)->str and then *this. One thing to keep in mind: it is the region that is pointed by the pointer that gets freed, not the pointer itself!

By the time control reaches the return statement, we get the picture below. Shaded regions indicate memory returned to the allocator; they may be re-used in subsequent allocation requests. For this reason, it is not wise trying to use released memory.


Observe the signature of the destructor: The sole argument is of type String *, not String. This modification is made to ensure that assignment of NULL to the handle of the deleted object is permanent. The reason why we make this assignment in the destructor is to relieve user(s) of having to do it after the call to the destructor, which is—since there are many users who are concerned about what is provided by the module and one implementer who should be concerned about how the module is implemented—certainly more error prone.[6]

void String_Destroy(String* this) {
  if (*this == NULL) return;

  free((*this)->str);
  (*this)->str = NULL;
  free(*this);

  *this = NULL;
} /* end of void String_Destroy(String*) */

int String_Compare(const String this, const String str2) {
  return(strcmp(this->str, str2->str));
} /* end of int String_Compare(const String, const String) */

String String_Concat(const String this, const String str2) {
  String res_str;

  res_str = (String) malloc(sizeof(struct _STR));
  if (!res_str) {
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (!res_str) */


Our representation has two levels: one for the information about the underlying string and one for the string itself. This means we need to issue two separate memory allocation commands, one for each level. So, at this point in the program we have the memory layout given above. There are random values in the length and str fields, which are probably left over from previous uses of the areas allocated to them.

  res_str->length = this->length + str2->length;
  res_str->str = (char *) malloc(res_str->length + 1);

  if (!res_str->str) {
    free(res_str);
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (!res_str->str)*/

By the time control reaches this point we will have assigned the accurate value to the length field and allocated enough memory to hold the concatenation of the two arguments. Note that content pointed to by str is random. This doesn’t mean that they cannot be used though. Application of strlen function on str will still yield legitimate values, depending on where the first ‘/0’ appears in memory.


  while (*this->str) *res_str->str++ = *this->str++;


Finally we are over with copying the first argument into the resultant String. Well, almost! Each time we copy a character from this->str into res_str->str, we advance the pointer so that it points to the next character to be copied. By the time we reach the end of the source String (this->str), both pointers will be greater than their original contents by as much as the length of the source string. As far as res_str->str is concerned this is OK since a second string is yet to be appended to it. But the original value of this->str is to be restored, which is done by the following compound assignment statement.

We could have done away with this readjustment had we used a temporary pointer to point to the same location as this->str and copy the source String using this temporary pointer. The code for this is as follows:

String String_Concat(const String this, const String str2) { String res_str; char* temp_str = this->str; ... ... while (*temp_str) *res_str->str++ = *temp_str++; ... ... } /* end of String String_Concat(const String, const String) */

  this->str -= this->length;

  while (*str2->str) *res_str->str++ = *str2->str++;
  str2->str -= str2->length;
  *res_str->str = '\0';
  res_str->str -= res_str->length;

  return(res_str);

Having copied the argument Strings into the resultant String we now return to the caller. Note that we return a pointer to some object that resides in heap. This gives us the opportunity to share the same object among function calls, something that we wouldn’t have had had we chosen to use an object that resides in the run-time stack: An object in the run-time stack is accessible only in the function it is created and those functions that are directly and indirectly called from this function.

This may lead us to think that returning a pointer to some object in the static data region or the frame of main function is a safe course. As far as lifetime of the object is concerned that’s right.[7] But we now face the limitation imposed by the static nature of the allocation: Objects allocated in these regions cannot dynamically change in size. Size of an object created in the static data region is fixed at compile-time while an object created in the run-time stack must have its size fixed at the point its definition is elaborated. This implies the only way of implementing dynamic data structures is through the use of heap memory.[8], [9]


} /* end of String String_Concat(const String, const String) */

int String_Contains(const String this, const String substr) {
  int i;
  unsigned int j;

  for(i = 0; i <= this->length - substr->length; i++) { 
    for (j = 0; j < substr->length; j++) 
      if (substr->str[j] != this->str[i + j]) break;
    if (j == substr->length) return(i);
  } /* end of outer for loop */

  return(SUBSTR_NOT_FOUND);
} /* end of int String_Contains(const String, const String) */

int String_Containsr(const String this, const String substr) {
  String this_rv = String__Reverse(this);
  String sub_rv = String__Reverse(substr);
  int where;

  if (this_rv == NULL || sub_rv == NULL) {
    fprintf(stderr, "Out of memory...\n");
    return(OUT_OF_MEMORY);
  } /* end of if (this_rv == NULL || sub_rv == NULL) */

Backward search can be formulated in terms of forward search. That’s exactly what we do here. We make a forward search of the reversed substring in the reversed string and return a value depending on the result of this search.


  where = String_Contains(this_rv, sub_rv);

  String_Destroy(&sub_rv);
  String_Destroy(&this_rv);
  if (where >= 0) return(this->length - substr->length - where);
    else return(SUBSTR_NOT_FOUND);
} /* end of int String_Containsr(const String, const String) */

unsigned int String_Length(const String this) {
  return(this->length);
} /* end of unsigned int String_Length(const String) */

String String_Substring(const String this, unsigned int start, unsigned int len) {
  String res_str;
  unsigned int i;

  if (start >= this->length || len == 0) return(String_Create(""));
  if (start + len > this->length) len = this->length - start;
  res_str = (String) malloc(sizeof(struct _STR));
  if (res_str == NULL) {
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (res_str == NULL) */
  res_str->str = (char *) malloc(len + 1);
  if (res_str->str == NULL) {
    free(res_str);
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (res_str->str == NULL) */

  for (i = 0; i < len; i++)
    res_str->str[i] = this->str[start + i];
  res_str->str[i] = '\0';
  res_str->length = len;

  return(res_str);
} /* end of String String_Substring(const String, unsigned int, unsigned int) */

The following is what we call a user-defined conversion function: it converts from String to char*. Unlike C++, where the compiler implicitly calls such functions, this conversion function must be explicitly invoked by the programmer.

char* String_Tocharp(const String this) {
  char *res_str = (char *) malloc(this->length + 1);
  if (!res_str) {
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if(!res_str) */
  strcpy(res_str, this->str);

  return(res_str);
} /* end of char* String_Tocharp(const String) */

static String String__Reverse(const String this) {
  String str_reverse;
  unsigned int i;

  str_reverse = (String) malloc(sizeof(struct _STR));
  if (!str_reverse) {
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (!str_reverse) */
  str_reverse->str = (char *) malloc(this->length + 1);
	
  if (!str_reverse->str) {
    free(str_reverse);
    fprintf(stderr, "Out of memory...\n");
    return(NULL);
  } /* end of if (!str_reverse) */
  str_reverse->length = this->length;

  for (i = 0; i < this->length; i++)
    str_reverse->str[i] = this->str[this->length - 1 - i];
  str_reverse->str[this->length] = '\0';

  return(str_reverse); 
} /* end of String String__Reverse(const String) */

Test Program

edit
String_Test.c
#include <stdio.h>
#include <stdlib.h>

#include "General.h"
#include "utility/String.h"

int main(void) {
  char *sztmp;
  int loc;
  String first_name, last_name, name;
  String strtmp, strtmp2;

  first_name = String_Create("Tevfik");

For printing the contents of a String variable, we convert it to a C-style character string and pass this to the printf function. But we have to be careful about avoiding garbage creation. If we pass the return value, a pointer to char, of String_Tocharp to printf without storing it in a variable, it will be lost by the time printf returns. This will turn the memory region pointed to by this pointer into garbage. Something we don’t want! For this reason, we first assign the return value to some temporary variable and then send it to the printf function. As soon as printf returns, region pointed to by the temporary variable is freed using free.

  printf("First name: %s", (sztmp = String_Tocharp(first_name)));
  printf("\tLength: %d\n", String_Length(first_name));
  free(sztmp);

  last_name = String_Create("Aktuglu");
  printf("Last name: %s", (sztmp = String_Tocharp(last_name)));
  printf("\tLength: %d\n", String_Length(last_name));
  free(sztmp);

  printf("Forward search for u in the last name: ");
  loc = String_Contains(last_name, strtmp = String_Create("u"));
  if (loc  == SUBSTR_NOT_FOUND)
    printf("u not found...Sth wrong with the String_Contains function!!!\n");
    else printf("u found at location %d\n", loc);
  String_Destroy(&strtmp);

  printf("Backward search for u in the last name: ");
  loc = String_Containsr(last_name, strtmp = String_Create("u"));
  if (loc == SUBSTR_NOT_FOUND)
    printf("u not found...Sth wrong with the String_Containsr function!!!\n");
    else printf("u found at location %d\n", loc);
  String_Destroy(&strtmp);

  name = String_Concat((strtmp = String_Concat(first_name, (strtmp2 = String_Create(" ")))), last_name);
  String_Destroy(&strtmp); String_Destroy(&strtmp2);
  printf("Name: %s", (sztmp = String_Tocharp(name)));
  printf("\tLength: %d\n", String_Length(name));
  free(sztmp);

  strtmp = String_Substring(name, 0, String_Length(first_name));
  printf("Comparing first name with the first %d characters of the name...",  String_Length(first_name));
  if (String_Compare(first_name, strtmp) == 0) printf("Equal\n");
    else printf("Not equal\n");

  String_Destroy(&strtmp); String_Destroy(&first_name);
  String_Destroy(&last_name); String_Destroy(&name);

  exit(0);
} /* end of int main(void) */

Debugging a Program with GDB

edit

A program compiled and linked with no errors does not necessarily mean that everything is okay. A logical error may have crept into the program and our executable will possibly produce wrong results. In such cases, we should go back to the source code and try to fix the glitch. The problem with this approach is the difficulty of deciding where to begin the search for the malicious bug. What if we are working on a large project involving the interaction of hundreds of functions and the logical error in question rears its head far away from its origin? We had better find an easier way!

A debugger is the answer to our worries. It lets us walk through the problematic software, modify the code on the fly, and pinpoint the potential trouble spots. What follows is an introduction to the GNU Debugger, GDB.

Compiling Source Programs for Debugging

edit

In order to debug a program, you need to generate debugging information when you compile it. This debugging information is stored in the object file; it contains meta-data to describe such things as the type of each variable and function, the correspondence between source line numbers and addresses in the executable code.

To request insertion of debugging information into the object code, you must specify the ‘-g’ option when you run the compiler. According to this, String_Test.c would be run by the following commands:

gcc –c –g –ID:\include String.c[10]
gcc –o String_Test.exe –g –ID:\include String_Test.c
String_Test

[Before moving ahead it should be stressed that—since certain optimizations involve code elimination/modification, which means the source code you debug and what the debugger shows may not match each other—enabling optimization together with the debugging option is not a good idea. Therefore, it is well-advised that you first turn off optimization (and turn on the debugging switch) and then, when you are convinced the code is throughly tested and it is time to hit the market-place, turn optimization on.]

In case something may go wrong in the program, issue the command given on the next line.

gdb String_Test.exe

This command will put you in the GDB environment and wait for you to issue GDB commands. You can also debug a crashed process and try to figure out what went wrong by passing the core dump[11] of the process as an extra argument. One can also monitor a running process. So, the gdb command can be summarized as follows:

gdb [''executable_file'' [''core_dump'' | ''process_id'']]

As can be seen from the above specification, gdb can be issued without any arguments. In such a case, you have to supply the argument values by means of GDB commands.

gdb String_Test.exe
gdb
GNU gdb 5.0
Copyright 2000 Free Software Foundation. Inc.
...
(gdb) file String_Test.exe

Similarly, a core dump can be supplied with the core-file command and GDB can attach itself to a process using the attach command.

Major GDB Commands

edit

A command is a single line of input with no limit to its length. It starts with a command name and is followed by arguments whose meaning depends on the command name.

GDB command names may always be truncated if that abbreviation is unambiguous. A time-saver for frequently used commands, an abbreviation with no ambiguous interpretation will surely execute the intended command.[12]

Entering a blank line as input to GDB (typing just RET) generally means to repeat the previous command. # is used to start a single line comment; anything from # to the end of the line is treated as a comment and therefore discarded.

Getting Help from GDB

edit

With so many commands in GDB, one may have difficulty remembering a particular one. In such cases you may ask GDB for information on its commands. All you need to know is the help[13] command, which comes in three flavors:

help
Displays a short list of command classes.
help command_class
Displays a list of commands found in a particular command class.
help command
Displays a short paragraph on how to use a specific command.
apropos reg_exp
Searches through all the GDB commands and their documentation for the regular expression specified in reg_exp. For example,
(gdb) apropos local var*↵
backtrace -- Print backtrace of all stack frames
bt -- Print backtrace of all stack frames
info locals -- Local variables of current stack frame
info locals -- Local variables of current stack frame
where -- Print backtrace of all stack frames
complete command_prefix
Lists all possible commands that start with command_prefix.

GDB can fill in the rest of a word in a command for you, if there is only one possibility; it can also show you what the possibilities are for the next word in a command, at any time. In order to get gdb to do command completion you must press the TAB key. In case there is no ambiguity gdb will fill in the word and wait for you to finish the command. Otherwise, it will sound a bell telling you about the possibilities. At this point you can either supply more characters and try again, or just press TAB a second time. Pressing TAB a second time displays all command names starting with the prefix you have entered, which is actually what is offered by the complete command. For example,

(gdb) h TAB TAB
handle hbreak help
(gdb) he TAB → (gdb) help

Stopping Execution

edit

Unless you tell otherwise, once a program starts it runs to completion. For a program being debugged this sort of behavior is not very helpful. We should be able to stop at certain points to check program state, iterate through the program statements, and probably modify code by means of interfering with the control flow of the program.

Stopping at certain points of the program can be achieved by setting breakpoints. In GDB, this is done by the break command. break can be used in many different ways.[14]

break (func_name | filename:func_name)
Sets a breakpoint at entry to the function named func_name [of the file named filename].
break (line_num | filename:line_num)
Sets a breakpoint at line line_num [of the file named filename].
break *address
Sets a breakpoint at address address. * prefix is used to signal that what comes next is to be interpreted as an address value.
break (+ | -) offset
Using the argument value as an offset, this command sets a breakpoint relative to the current source line.
break
Sets a breakpoint at the next instruction to be executed. Equivalent to break +0.
break location if condition
Sets a breakpoint at a location, which can be provided in different ways as described above. GDB will take control of the program (that is, the program will stop) only if condition evaluates to a non-zero value. Note that the unconditional versions of break can be expressed using a conditional that always evaluates to a nonzero value. That is, previous break commands are equivalent to break location if n, where n is a non-zero value.

Above group of alternative usages can be compactly defined as

break [location] [if condition]

where [ and ] are used to express zero or one instance of the enclosed syntactic units.

tbreak location [if condition]
Sets a temporary breakpoint, which will be deleted after the first time the program stops at location specified as the argument. Typically, one would put such a breakpoint at the entry point to a program.
hbreak location [if condition]
Sets a hardware-assisted breakpoint. Due to the lack of hardware support, GDB may not be able to set this type of breakpoints. Even when it is supported, the number of registers used for this purpose may not be enough to meet your demands. In such a case, you have to delete or disable an unused breakpoint and reuse it for the new one.
thbreak location [if condition]
Sets a temporary, hardware-assisted breakpoint.
rbreak reg_exp
Sets breakpoints on all functions matching the regular expression reg_exp. Keep in mind that there is an implicit ".*" leading and trailing the regular expression you supply. So, issued while debugging String_Test.exe,
rbreak .*Cont.*↵ and rbreak Cont↵
are equivalent and will both set breakpoints at String_Contains and
String_Containsr
.

A watchpoint is a special breakpoint that stops your program when the value of an expression changes. This relieves you from the burden of predicting places where such changes may occur. It comes in three flavors:

watch expr
Sets a watchpoint that will break whenever expr is written into by the program and its value changes.
rwatch expr
Sets a watchpoint that will break whenever expr is read by the program.
awatch expr
Sets a watchpoint that will break whenever expr is accessed (read or written into) by the program.

A catchpoint is another special breakpoint that stops your program when a certain kind of event occurs, such as the throwing of a C++ exception or the loading of a library.

An unconditional breakpoint, be that a breakpoint, watchpoint, or catchpoint, can be turned into a conditional one by means of the condition command.

condition breakpoint_num [expression]
Adds the expression as a condition for the breakpoint specified by breakpoint_num. If there is no expression part, then any condition set for the breakpoint is removed. That is, it becomes an ordinary unconditional breakpoint.

Here, breakpoint number is an index used to refer to a particular breakpoint. It can be found out by issuing the info breakpoints command.

One can give any breakpoint a series of commands to execute when the program stops due to that breakpoint.

commands [breakpoint_num]
command1
command2
...
commandn
end
Missing breakpoint number will cause the commands to be attached to the last breakpoint set (not encountered!). Following commands and breakpoint number with end can easily accomplish removal of a command list from a breakpoint.

Effect of commands can be partially obtained by display command, which adds its argument expression to a so-called automatic display list. All items in this list are printed each time the program is stopped.

Having figured out the problem in our program we may want to remove a breakpoint. This can be done by using clear and delete commands.

clear
Removes breakpoints set at the instruction about to be executed.
clear (function | filename:function)
Removes the breakpoint set at the entry of the function passed as the argument.
clear (line_num | filename:line_num)
Removes any breakpoint set at or within the code of the specified line.
delete [breakpoints] [list_of_breakpoints]
Removes the breakpoints passed as arguments. With no argument, it deletes all the breakpoints.

Instead of removing a breakpoint we may choose to ignore or disable it. Such a breakpoint will be present but not effective, awaiting its revival.

ignore breakpoint_num ignore_count
Causes the breakpoint referred to by breakpoint_num to be bypassed ignore_count times.
disable [breakpoints] [list_of_breakpoints]
Causes the given breakpoints to be ignored till a relevant enable command. If no list is supplied, all breakpoints are disabled.
enable [breakpoints] [list_of_breakpoints]
Activates the previously disabled list of breakpoints.
enable [breakpoints] once list_of_breakpoints
Activates the given list of breakpoints for only once.
enable [breakpoints] delete list_of_breakpoints
Activates the given list of breakpoints to work once, and then die. GDB will delete any breakpoint in the list as soon as the program stops there.

Resuming Execution

edit

Once a program has been stopped, it can be resumed in different ways. Following is a list of such commands:

next [no_of_repetitions]
Continues to the next source line. If the current line to be executed contains a function call, it is executed in one single step without going into it. In other words, it never increases the depth of the run-time stack. Supplied argument tells GDB the number of times to execute the next command.
step [no_of_repetitions]
Continues to the next source line. Unlike next, if the current line to be executed contains a function call, a new stack frame will be inserted and control will flow into the very first executable statement in the called function.
nexti [no_of_repetitions]
Executes the next machine instruction and returns to the debugger. If the next instruction is a function call, it executes the entire function and then returns.
stepi [no_of_repetitions]
Executes the next machine instruction and returns to the debugger.
continue [ignore_counts]
Continues running the program up to the next breakpoint. When passed an argument, it is taken to mean the number of times the debugger will ignore the breakpoint set at the most recently encountered location for ignore_count – 1 times. So, continue is equivalent to continue 1.
until [location]
When passed an argument, until continues running the program either until the specified location or end of the current function is reached. When used without an argument, location is assumed be the current line. It is very useful for avoiding stepping through a loop.
finish
Continues running the current function to completion.
return [ret_value]
Returns immediately to the caller without executing the remaining statements in the callee—that is, the current function. If it’s a value-returning function, ret_value is interpreted as the return value of the callee.
call expr
Evaluates the expression passed as its only argument without changing the current location. Note that side-effects due to evaluation is permanent.
jump (line_no | *address)
Resumes execution at line or address specified in the argument.

The diagram below shows the effect of these commands. A dashed line is used to indicate a change in state where effect of the transition is achieved by visiting all the intermediate states. For example, “finishing f1” means executing the remaining statements in f1 and then returning, which is suggested by the dashed line. “returning from f2”, however, is shown by a straight line, meaning it skips all statements in f2 and returns control to main.

 

Tracing a Running Program

edit

There are times when the very act of stopping the program may affect the correctness of the program or may degrade the quality of service. An example to the former is a real-time program, such as those found in embedded systems; an example to the latter is a server that should stay up 24 hours a day, 7 days a week. In such cases we may choose to use the trace and collect commands found in GDB. Using these and a few other commands, we can specify locations in the program and cause certain actions, such as data collection, to take place at these locations. Such locations are called tracepoints. Data collected at the tracepoints are later inspected to see what goes wrong within the program.

trace location
Sets a tracepoint at the specified location. Setting a tracepoint or changing its commands doesn’t take effect until the next tstart command.
tstart
Starts the trace experiment and begins collecting data.
tstatus
Displays the status of the current trace data collection.
info tracepoints [tracepoint_num]
Displays information about a particular tracepoint. Without an argument it displays information about all tracepoints defined so far. Among other information items, it provides the system assigned number of each tracepoint.
actions [tracepoint_num]
Defines a list of actions to be taken when the tracepoint numbered tracepoint_num is hit. With no arguments, tracepoint most recently defined will be used.
There are commands special to an action block. These are collect and while-stepping.
collect expr [, expression_list]
Collects values of the given expression(s) when the tracepoint is hit. In addition to expressions formed according to the syntax rules of the source language, you can use one of the following.
$regs Collects all registers.
$args Collects all function arguments.
$locals Collects all local variables.
while-stepping n
Performs n single-step traces after the tracepoint, collecting new data at each step.

Once enough data has been collected we may either stop the experiment, not the program, explicitly or get GDB to stop on its own using the passcount of each tracepoint.

tstop
Ends the trace experiment and stops collecting data.
passcount [n [tracepoint_num]]
Sets the passcount of a tracepoint. Passcount is the maximum number of hits a tracepoint can take. So, setting the passcount of a tracepoint to n will stop the trace experiment at or before the nth time that particular tracepoint is hit. How many times the other tracepoints may have been hit and whatever their passcounts may be, the very first tracepoint to be hit as many times as its passcount will stop the trace experiment. Missing tracepoint number assumes the most recently defined tracepoint. With no passcount value, the trace experiment will go on till it is stopped explicitly by the user.

Data collected each time a tracepoint is hit is called a snapshot. These snapshots are consecutively numbered from zero and go into a buffer so that you can examine them later.

tfind snaphot_num
Retrieves the trace snapshot numbered snapshot_num. Argument can be specified in a number of ways as given below:
start
Finds the first snapshot in the buffer. It’s a synonym for 0.
none
Stops debugging snapshots and resume live debugging.
end
Same as none.
ENTER
Finds the next snapshot in the buffer.
-
Finds the previous snapshot in the buffer.
tracepoint_num
Finds the next snapshot associated with tracepoint_num.
pc addr
Finds the next snapshot whose pc (program counter) value is addr.
outside addr1, addr2
Finds the next snapshot whose pc is outside the given range.
line [filename
]n
Finds the next snapshot associated with a certain source line.

A trace experiment typically involves the following steps.

  1. Set a tracepoint using the trace command.
  2. Using actions, attach a list of actions to the tracepoint.
  3. Set a passcount for the tracepoint, if needed.
  4. Repeat 1-3 for all other data collection points.
  5. Start the trace experiment by using tstart.
  6. Use tstop to stop the trace experiment or wait for a tracepoint to stop it (trace experiment) by reaching its (tracepoint’s) passcount.
  7. Use tfind to inspect values collected during the experiment.

Examining the Program

edit

Each time the program being debugged stops gdb prints the line to be executed. This may not be sufficient to give you an idea about the context. In such a case you need to get a listing of the program lines around the current line. You can do this by using the list command.

list [line]
When passed an argument, list prints lines centered on the line specified by the argument. This value can be specified by giving a line number [in a particular file], function name [in a specific file], an address, or an offset.[15] With no argument, list prints lines following the last lines printed. Passing + as argument has the same effect with passing no argument. Passing - as argument causes listing of the lines coming just before the lines last printed.
list first_line , last_line
When passed two arguments, list prints source lines between the first and second arguments.[16] If either of the arguments is missing, with ‘,’ still present, this missing value is figured out by the other argument.

In addition to listing parts of the source file you may want to see the value of a variable or evaluate an expression.

print [/format] expression
Evaluates expression and prints the result using format. Format specification can be any one of the following:
d Interpret the value as an integer and print in signed decimal.
u Interpret the value as an integer and print in unsigned decimal.
x Interpret the value as an integer and print in hexadecimal.
o Interpret the value as an integer and print in octal.
t Interpret the value as an integer and print in binary.
c Interpret the value as an integer and print it as a character constant.
a Treat the value as an address and print.
f Print as a floating value.
A missing format specification will have GDB use an appropriate type. inspect is a synonym of this command.
Expressions can be formed according to the syntax rules of the source language. We can extend this syntax using certain operators. @ treats parts of memory as an array; :: allows you to specify the scope of an identifier. For example, given that seq is of int*,
print *sum_all::seq@10
sees the area of memory pointed to by seq, which is local to a function named sum_all, as an array of size 10 and prints this array.

Each time an expression is evaluated and printed it is also saved in what is called value history. Values found in this history can be referred to by prefixing its order of printing by the $ sign. First value to be printed is $1; second value is $2 and so on. A single $ refers to the most recently printed value, and $$ refers to the value before that. $$n refers to the nth value from the end. So, $$0 is equivalent to $. One thing to keep in mind: value history holds values of expressions, not expressions themselves. That is, given that *name holds "Mete" as its value, what gets stored after printing *name in the value history is "Mete", not "*name".

Values in the value history can be inspected using the values subcommand of show.

show values [ n | + ]
Prints ten entries of value history centered on n. If no argument is passed, it prints the last ten values. When supplied a + as the argument, this command prints ten history values just after the values last printed.

If you do not want to record the result of the evaluation in the value history, you should use the x commands.

x [[/repeat_count format unit] address]
Examines memory starting at address, independently of the underlying real data type. In addition to those used in print, format can be s (for strings) or i (for machine instructions). repeat_count and unit are used to determine how much memory to display: repeat_count * unit bytes. repeat_count is a decimal value while unit can be one of the following:
b Byte.
h Halfword (two bytes).
w Word (four bytes).
g Giant words (eight bytes).
Note that any one of the parameters might be missing. Format defaults to x initially and changes each time you use either x or print. Repetition count defaults to 1. Unit size specified for a particular value is accepted as a default next time same format is used.

When your program stops, you may want to know where it stopped and how it got there. This requires information about the run-time stack, which are provided by the following commands. Note that these commands do not change the control flow of the program. That is; issuing next after a combination of the following commands will still continue execution from the most recently hit breakpoint. In other words, none of the following commands modifies the instruction pointer.

frame [frame_no | address]
Moves you from one stack frame to another and prints information about the stack frame you select. Stack frame can be selected by passing either the address or number of the frame. frame_no is a non-negative number, where the currently executing frame is 0, its caller is 1, and so on.
backtrace [[(- | +)] number_of_frames]
Prints a summary of how your program got where it is. When used with no argument, this command prints a backtrace of the entire stack; with a positive number, it prints the innermost (most recent) n stack frames; with a negative value passed as an argument, it prints the least recent n stack frames. A synonym for this command is info stack.
up [n]
Moves n frames up the stack. That is, it moves toward less recently created frames. Default value for n is 1.
down [n]
Moves n frames down the stack. In other words, it moves toward more recent stack frames.

In addition to these, we may get further information through the use of info subcommands.

info frame [address]
Prints a description of the frame without selecting it.
info args
Prints the arguments of the selected frame.
info locals
Prints the local variables of the selected frame.
info variables [reg_exp]
Displays all globals and static variable names used in the program. When passed an argument, it displays all such information items matching the regular expression.
info scope (function_name | *address | line_num)
Lists the variables local to a scope. This listing includes the address of arguments and local variables relative to frame base.

Debuggers with Graphical User Interface

edit

If you are not in for the extra flexibility of the [incomplete!] command list presented in the previous section, you may want to try one of the following debuggers, which basically put a graphical interface on top of GDB.

Debugger Description
rhide This development environment can be used from within both DJGPP and Cygwin. Typing rhide at the command prompt of either environment will put you in a DOS window. Note that this is a complete IDE, which happens to include a GDB-based debugger.
GNU DDD In addition to Linux distributions, this debugger can be used from Cygwin. In order to start this debugger from the Cygwin command prompt, you need to first issue an xinit command. Following this, entering ddd will open up a graphical debugger window.
insight Coming with the RedHat distribution of Linux, this debugger can be started by entering insight at the command prompt.
kdbg Part of the SUSE distribution of Linux, this debugger can be started by entering kdbg at the command prompt.

In addition to these, you can try GNU Emacs, XEmacs, Eclipse, KDevelop, NetBeans and MS Visual Studio, which also offer integrated debugging support. One final thing you need to keep in mind: not all debuggers/environments support all debugging formats. For instance, an executable containing GDB debugging information will not make any sense to MS Visual Studio.

Notes

edit
  1. There is a rarely used option of invoking the garbage collector explicitly.
  2. However, there are tools such as lint or splint, which can be satisfactorily used to this end.
  3. As an example to pointers allocated statically consider the head of a list. As an example to pointers allocated dynamically consider the next node link info in the same list. But whether the pointer is allocated statically or dynamically and what it points to is in the static data area or run-rime stack, one thing is certain: dynamically allocated memory is manipulated through pointers.
  4. Note the argument passed to the String_Create function may point to memory in any one of the three regions.
  5. Presuming you use the underlying object through functions.
  6. For an example to the latter method, see the Object-Based Programming chapter.
  7. After all, the former has a static extent while the latter is allocated at entry to main and de-allocated on exit, which is practically equivalent to the first case.
  8. We can still simulate dynamic data structures by defining very large static data structures and using this [static structure] as our heap. Indeed, this method is used to implement dynamic structures in programming languages such as pre-90 FORTRAN and COBOL. Aside from its inability to provide 100% immunity from overflow, this approach is rather wasteful of memory.
  9. Note that the figure does not reflect the C calling convention accurately. Since the area in the run-time stack will be discarded by the caller, it is a bit premature to cross out the bottom half.
  10. It should be noted this command and the next two are probably issued by two different parties: implementer and user, respectively. Not forecasting the possibility of debugging or [more likely] not willing to give away information about the implementation, implementer will probably not provide you with debug versions of the object modules. In such a case, source-code level debugging of the program is not a possibility when control flow is directed into one of the functions found in the module.
  11. Core dump is the disk file containing the contents of random access memory at the point of crash, which can later be used to figure out the cause of failure.
  12. In our presentation, command abbreviations will be indicated by underlining parts of the relevant command.
  13. Throughout the rest of this handout, we will underline part of the commands to express the shortest possible prefix that stands for the command in question.
  14. ( , ), [, ], | are metacharacters; they are not part of the syntax used to form the arguments passed to the commands. ( and ) are used for grouping; [ and ] are used for expressing an option; | is used to separate alternatives.
  15. Offset value is added to the last line printed.
  16. When used with offset values, second argument is added to the value obtained by adding the first argument to the line to be printed.