Memory Management/Garbage Collection

We've already seen how complicated memory management can be, especially if you are allocating and freeing memory manually. Thankfully, there is a class of systems known as garbage collectors which can help to automate the process of memory reclamation.

What are Garbage CollectorsEdit

Garbage Collectors (GC) are systems or sub-systems that are used to manage dynamic memory automatically. Here is how they work:

  1. Instead of calling malloc and free directly, these are replaced with a function from the GC called "gc_malloc". Obviously, this function can be named anything in practice.
  2. When we call gc_malloc in our program, the garbage collector calls malloc to allocate memory from the system, and find some way to keep track of the memory. There are many ways to track memory, and we will discuss some of them in future chapters
  3. The program runs like normal, allocating memory from the GC when needed, but never freeing the memory explicitly.
  4. The GC performs a function called a trace intermittently. This can be a synchronous or asynchronous occurrence. During the trace, the GC starts with a set of objects that are immediately visible to the system, called the root set of memory objects. It follows pointers in these memory objects to child-objects. When it reaches an object, it marks it as being alive.
  5. When the GC has finished the trace, and no more pointers can be followed, the reclamation phase begins. All objects which are not marked alive are considered dead, because no pointers from the program point to them, and therefore they cannot be possibly accessed by the program. All dead objects are freed by the GC.

Types of CollectorsEdit

There are two primary types of garbage collectors, although often a hybrid approach is found between these to suit particular needs. The first type, the one which might be the most intuitive, is a reference counting collector. The second one, which is most similar to what we described above, is a tracing collector.

Reference Counting CollectorEdit

When a new memory object is allocated by the GC, it is given an integer count field. Every time a pointer is made to that object, a reference, the count is increased. So long as the count is a positive non-zero integer, the object is actively being referenced and is still alive.

When a reference to the object is removed, the count is decremented. When the count reaches zero, the object is dead and can be immediately reclaimed.

There are a number of points to remember about Reference Counting collectors:

  1. Circular references will never be reclaimed, even if the entire set of objects is dead.
  2. Reference counting is pervasive: The entire program must be made aware of the system, and every pointer reference or dereference must be accompanied by an appropriate increment or decrement. Failing to maintain the count, even once in a large program, will create memory problems for your program.
  3. Reference counting can be costly, because counts must be manipulated for every pointer operation, and the count must be tested against zero on ever decrement. These operations can, if used often enough, create a performance penalty for your program.

These types of collectors are often called cooperative collectors because they require cooperation from the rest of the system to maintain the counts.

Tracing CollectorEdit

Tracing collectors are entirely dissimilar from reference counting collectors, and have opposite strengths and weaknesses.

When the Tracing GC allocates a new memory chunk, the GC does not create a counter, but it does create a flag to determine when the item has been marked, and a pointer to the object that the GC keeps. The flags are not manipulated by the program itself, but are only manipulated by the GC when it performs a run.

During a GC run, the program execution typically halts. This can cause intermittent pauses in the program, pauses which can be quite long if there are many memory objects to trace.

The GC selects a set of root objects which are available to the current program scope and parent scopes. Starting from these objects, the GC identifies all pointers within the objects, called children. The object itself is marked as being alive, and then the collector moves to each child and marks it in the same way. The memory objects form a sort of tree structure, and the GC traverses this tree using recursive or stack-based methods.

At the end of the GC run, when there are no more children to be marked, all unmarked objects are considered unreachable and therefore dead. All dead objects are collected.

A few points to remember about Tracing GCs:

  1. Tracing GCs can be used to find cycles, memory objects whose pointers form circular structures. Reference Counting schemes cannot do this.
  2. Tracing GCs cause pauses in the program, and these pauses can become unbearably long in some complex programs that use many small memory objects.
  3. Dead objects are not reclaimed immediately. Reclamation only occurs after a GC run. This causes a certain inefficiency in memory usage.
  4. Tracing collectors do not require the program to account explicitly for memory counts or memory status updates. All memory tracking logic is stored inside the GC itself. This makes it easier to write extensions for these systems, and also makes it easier to install a Tracing GC in an existing system then to install a Reference Counting one.

Tracing GCs are often called uncooperative collectors because they do not require cooperation from the rest of the system to function properly.

Hybrid CollectorsEdit

Sometimes, reference counting schemes will utilize Tracing systems to find cyclical garbage. Tracing systems may employ reference counts on very large objects to ensure they are reclaimed quickly. These are just two examples of hybridized garbage collectors that are more common then either of the two "pure" types described above.

In later chapters, we will discuss garbage collectors and their algorithms in more detail.

Last modified on 3 June 2008, at 02:40