Even though with the .NET framework we don't have to actively worry about memory management and garbage collection (GC), we still have to keep memory management and GC in mind in order to optimize the performance of our applications. Also, having a basic understanding of how memory management works will help explain the behavior of the variables we work with in every program we write. In this article we'll look into Garbage Collection (GC) and some ways to keep our applications running efficiently.
Graphing
Let's look at this from the GC's point of view. If we are responsible for "taking out the trash" we need a plan to do this effectively. Obviously, we need to determine what is garbage and what is not (this might be a bit painful for the pack-rats out there).
In order to determine what needs to be kept, we'll first make the assumption that everything not being used is trash (those piles of old papers in the corner, the box of junk in the attic, everything in the closets, etc.) Imagine we live with our two good friends: Joseph Ivan Thomas (JIT) and Cindy Lorraine Richmond (CLR). Joe and Cindy keep track of what they are using and give us a list of things they need to keep. We'll call the initial list our "root" list because we are using it as a starting point. We'll be keeping a master list to graph where everything is in the house that we want to keep. Anything that is needed to make things on our list work will be added to the graph (if we're keeping the TV we don't throw out the remote control for the TV, so it will be added to the list. If we're keeping the computer the keyboard and monitor will be added to the "keep" list).
This is how the GC determines what to keep as well. It receives a list of "root" object references to keep from just-in-time (JIT) compiler and common language runtime (CLR) (Remember Joe and Claire?) and then recursively searches object references to build a graph of what should be kept.
Roots consist of:
- Global/Static pointers. One way to make sure our objects are not garbage collected by keeping a reference to them in a static variable.
- Pointers on the stack. We don't want to throw away what our application's threads still need in order to execute.
- CPU register pointers. Anything in the managed heap that is pointed to by a memory address in the CPU should be preserved (don't throw it out).
In the above diagram, objects 1, 3, and 5 in our managed heap are referenced from a root 1 and 5 are directly referenced and 3 is found during the recursive search. If we go back to our analogy and object 1 is our television, object 3 could be our remote control. After all objects are graphed we are ready to move on to the next step, compacting.
Compacting
Now that we have graphed what objects we will keep, we can just move the "keeper objects" around to pack things up.
Fortunately, in our house we don't need to clean out the space before we put something else there. Since Object 2 is not needed, as the GC we'll move Object 3 down and fix the pointer in Object 1.
Next, as the GC, we'll copy Object 5 down
Now that everything is cleaned up we just need to write a sticky note and put it on the top of our compacted heap to let Claire know where to put new objects.
Knowing the nitty-gritty of CG helps in understanding that moving objects around can be very taxing. As you can see, it makes sense that if we can reduce the size of what we have to move, we'll improve the whole GC process because there will be less to copy.
What about things outside the managed heap?
As the person responsible for garbage collection, one problem we run into in cleaning house is how to handle objects in the car. When cleaning, we need to clean everything up. What if the laptop is in the house and the batteries are in the car?
There are situations where the GC needs to execute code to clean up non-managed resources such as files, database connections, network connections, etc. One possible way to handle this is through a finalizer.
class Sample
{
~Sample()
{
// FINALIZER: CLEAN UP HERE
}
}
During object creation, all objects with a finalizer are added to a finalization queue. Let's say objects 1, 4, and 5 have finalizers and are on the finalization queue. Let's look at what happens when objects 2 and 4 are no longer referenced by the application and ready for garbage collection.
Object 2 is treated in the usual fashion. However, when we get to object 4, the GC sees that it is on the finalization queue and instead of reclaiming the memory object 4 owns, object 4 is moved and it's finalizer is added to a special queue named freachable.
There is a dedicated thread for executing freachable queue items. Once the finalizer is executed by this thread on Object 4, it is removed from the freachable queue. Then and only then is Objet 4 ready for collection.
So Object 4 lives on until the next round of GC.
Because adding a finalizer to our classes creates additional work for GC it can be very expensive and adversely affect the performance garbage collection and thus our program. Only use finalizers when you are absolutely sure you need them.
A better practice is to be sure to clean up non-managed resources. As you can imagine, it is preferable to explicitly close connections and use the IDisposable interface for cleaning up instead of a finalizer where possible.
IDisposaible
Classes that implement IDisposable perform clean-up in the Dispose() method (which is the only signature of the interface). So if we have a ResouceUser class instead of using a finalizer as follows:
public class ResourceUser
{
~ResourceUser() // THIS IS A FINALIZER
{
// DO CLEANUP HERE
}
}
We can use IDisposable as a better way to implement the same functionality:
public class ResourceUser : IDisposable
{
#region IDisposable Members
public void Dispose()
{
// CLEAN UP HERE!!!
}
#endregion
}
IDisposable in integrated with the using keyword. At the end of the using block Dispose() is called on the object declared in using(). The object should not be referenced after the using block because it should be essentially considered "gone" and ready to be cleaned up by the GC.
public static void DoSomething()
{
ResourceUser rec = new ResourceUser();
using (rec)
{
// DO SOMETHING
} // DISPOSE CALLED HERE
// DON'T ACCESS rec HERE
}
I like putting the declaration for the object in the using block because it makes more sense visabally and rec is no longer available outside of the scope of the using block. Whis this pattern is more in line with the intention of the IDisposible interface, it is not required.
public static void DoSomething()
{
using (ResourceUser rec = new ResourceUser())
{
// DO SOMETHING
} // DISPOSE CALLED HERE
}
By using using() with classes that implement IDisposible we can perform our cleanup without putting additional overhead on the GC by forcing it to finalize our objects.
Static Variables: Watch Out!
class Counter
{
private static int s_Number = 0;
public static int GetNextNumber()
{
int newNumber = s_Number;
// DO SOME STUFF
s_Number = newNumber + 1;
return newNumber;
}
}
If two threads call GetNextNumber() at the same time and both are assigned the same value for newNumber before s_Num}