Introduction to Programming Languages/Garbage Collection

Garbage collection

Garbage collection (GC) is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program. Garbage collection is often portrayed as the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. Garbage collection, like other memory management techniques, may take a significant proportion of total processing time in a program and can thus have significant influence on performance.

In this section we will describe three forms of Garbage Collection.

Mark and sweep

Naive Mark and Sweep in action on a heap containing eight objects. Arrows represent object references. Circles represent the objects themselves. Objects #1, #2, #3, #4, and #6 are strongly referenced from the root set. On the other hand, objects #5, #7, and #8 are not strongly referenced either directly or indirectly from the root set; therefore, they are garbage.

A Mark and Sweep collector tries to find the live heap links and mark those blocks that are reachable. Then it makes a pass over the heap and returns unmarked free blocks to the free pool.

The mark-and-sweep method is the first known strategy for garbage collection. In this method, the garbage collector scans the program, starting from a point called root set (a set of objects easily identified in the program) and tries to reach other objects from then. The collector repeats this scanning until it cannot find any more reachable objects. Every time an object is found, its "being used" bit is set. After the collector finishes marking objects, it goes through the whole heap freeing those objects that do not have the "being used" bit set.

This method has several disadvantages, the most notable being that the entire system must be suspended during collection. This will cause programs to 'freeze' periodically (and generally unpredictably), making real-time and time-critical applications impossible.

Copying collection

In a Copying Collection collector memory is divided in two; only one half is used at a time. When used half is full, copy used blocks to the other location, and erase the old one.

This method has two big disadvantages. First, it has to stop the program's execution to move objects from one part of the heap to another. Second, it has to change the addresses to which program objects point to. This involves changing the values of variables in the program. In languages like C you store an object's address in an integer, making even harder for the garbage collector to find out which program variables store objects' addresses.

The biggest advantage of Copying Collection is the fact that if the size of the memory used by the program is less than the size of the half of the heap that is being used, no copying will be necessary, thus avoiding stopping the program and changing objects' addresses.

Reference counting

In a Reference Counting collector each block has a counter of heap links to it. This counter is incremented when a heap link is copied, decremented when the link is discarded. When the counter goes to zero, the block is freed. Compared to tracing garbage collection, reference counting guarantees that objects are destroyed as soon as they become unreachable.

The biggest advantage of Reference Counting is the fact that the program does not need to stop in order to perform garbage collection.The biggest disadvantages are:

Extra space is used to store the reference counter.
If two or more objects refer to each other, they can create a cycle where neither will be collected as their mutual references never let their reference counts become zero.