Demystify Garbage Collection in C#: Part 5

Welcome to the Demystify Garbage Collection article series. This is the fifth part in the series. If you are new to this series I recommend you have a look at the previous articles in the series, although this article is not a continuation any previous one. The following are links to the other articles.

In this article we will see how the Garbage Collection algorithm works. We will next see how to implement a weak reference to optimize performance. Before starting with the Garbage Collection algorithm we will first try to understand the concept of Application Root.

What is Application Root?

Every application has a set of roots. Roots identify storage locations that refer to objects on the managed heap or objects that are set to null.

[Before writing this article I made a Google search using "Application Root in heap memory" as the search string and got a couple of articles related to that topic. Then I started to visit each one of them. What a surprise!! All articles have the same content. All are copied from each other. But I don't know which one is the original and which one is copied. And I also copied the previous two lines from them.]

Now, let me start in my own words. Each application has a collection of application roots. Those roots point to various storage locations. Basically they point to the heap and the stack or they may point to a CPU register. Now, why are there many application roots when we have only one application?

OK, let's think about an application with various data types. In other words, a few data types are integer type and a few are object type (in other words class variable). As we know, in general, a value type is created on the stack and a reference type is created on the heap. Again, if you specify a register storage location then the variable may be created in a CPU register (yes, it may, because there are a few conditions associated with it). Now, since storage locations are different, as usual various root pointers are needed to point to various storage locations. That is the reason behind multiple application roots.

So, one concept is clear. Why are there many root pointers? The next concept is what they do.

Each and every pointer points to a certain memory address because those addresses are needed by the JIT compiler and the Garbage Collector algorithm in program execution. In this article we will see how the Garbage Collector creates a memory graph of all live objects.

How allocations are done in heap memory

Before starting with the Garbage Collection algorithm, it is very essential to understand the heap and how allocation happens in heap memory. The Heap is nothing but a storage location and it is not properly ordered. We know that object types or reference type variables are created in memory. Now the question is, in which location? The answer is, in the next blank location, as in the following:

image1.gif

In this image we are seeing the memory allocation of heap memory. Since it is not properly managed like stack memory, it gets created in the next free location. Then the heap pointer keeps track of the next free space in heap memory.

Now, the concept is that both dead and live objects may stay in memory. And when the Garbage Collector runs, the memory for dead objects are claimed and collected. All live objects are then rearranged in the heap. This is the process that takes place in the heap by Garbage Collection.

Garbage Collection algorithm

I hope we now have a basic understanding of heap memory and the object creation process. Here we will discuss the Garbage Collection algorithm. First, Garbage Collection occurs in heap memory and the second point is that no one knows (yes, nearly no one except maybe the Microsoft guys know; Ha...Ha) when it will run and collect unused objects from the heap. The most we may say is that Garbage Collection occurs when there is a shortage of memory for the application. But .NET knows when the best time is to run the operation.

The Garbage Collection algorithm performs operations in two phases.

  1. Mark
  2. Compact

The first phase is the marking phase, in other words in this phase the Garbage Collector marks the dead objects to be collected in the next phase. In the second phase the marked objects are collected. Let's see those phases more closely.

Phase 1: Mark

image2.gif

As the name suggests, in this phase the Garbage Collector marks all unused object(s) in heap memory. When the Garbage Collector starts, it assumes that all objects in memory (the heap) are garbage.

  1. The Garbage Collector identifies the Application Root of the application.
  2. It starts to reach each and every object present in the heap from the application root and internally it starts to maintain a graph containing all objects reachable from the root.
  3. If an object is encountered that is already present in the graph then the Garbage Collector changes its path and attempts to find another route to reach the objects. Thus the Garbage Collector avoids cycles and infinite loops.
  4. Once the entire object is traversed, the Garbage Collector's graph contains the set of objects that are reachable from the Application's Root and those that are live objects.
    And the objects that are not reachable from the Application's Root are called garbage and will be collected in the next phase.

Phase 2: Compact

image3.gif

In this phase all live object come down to the bottom of the heap leaving space in a top memory location (of course in the heap). Let's see that in a few steps.

  1. The Garbage Collection program traverses each and every memory location in the heap to check for continuous memory of garbage objects.
  2. Then it shifts all non-garbage objects to lower portions of memory and removes all gaps in memory (gaps due to claimed memory in the garbage object's position).
  3. Now, when all objects have been shifted into a new location, all their pointers need to be revised. The Garbage Collector adjusts all the pointers of the objects. So if one pointer points to another pointer (in other words another memory location) then the Garbage Collector is responsible for adjusting the pointer.

This is the second phase of the Garbage Collection operation.

Conclusion

Here we have discussed Garbage Collection and in the next article we will discuss why the Garbage Collection operation is a very costly one. I hope you have enjoyed this article.

Next Recommended Readings