In this article, I am going to give a brief idea of List, HashSet and SortedSet performances in various situations in terms of their time complexity. This also shows how we can remove duplicates from a List using HashSet or SortedSet.
In .Net, as we know we have many generic collections available such as List, Dictionary, HashTable, HashSet, HashMap, SortedSet etc. These choices sometimes makes us a little confused if we don't have much of an idea of them; what they do or what purpose they serve. The decision of which collection to use depends upon our requirements. For example, sometimes we could go for that collection which supports specialized operations; sometimes we select that collection which has good performance characteristics for required operations. Here now here we are going to discuss two of them: HashSet and SortedSet collections.
Sometimes, in programming we are not sure whether our List does not have duplicate elements. If we want to remove all the duplicates elements, then we can use a collection, eiher "HashSet" or "SortedSet" (if we want them to be in order).
For a List Collection, we can get an idea about "How we can use a List Collection and its built-in methods using lambda expressions" from a previous article:
http://www.c-sharpcorner.com/UploadFile/0f68f2/using-lambda-expression-over-a-list-in-C-Sharp/
What is HashSet
- HashSet holds a set of objects, but in a way that it allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hash code of the object.
- HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove and Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.)
- The only catch of HashSetis is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that.
- Since there is no Built-in Sort Method, enumerating the elements in a sorted order forces you to copy the items to a different collection (like a List<T>) and sort the resulting list. Sort is typically an O(n log n) operation.
What is SortedSet
- SortedSet is an ordered set collection. You have many elements you need to store, and you want to store them in a sorted order and also eliminate all duplicates from the data structure.
- SortedSet does not include hashing, meaning that it has to do linear searches for lookups. Therefore, the SortedSet is much slower than the HashSet for most cases where you need to do lookups.
- Internally, the SortedSet is implemented as a tree with a Root node, and a Left and Right node on every node instance. This is the typical implementation of a binary tree. Every node will need to be allocated on the managed heap. The sorted set ensures that the elements in the set are always in sorted order. Every Add operation places the new element in the correct location in the set. That means Add is an O(log n) operation.
- The SortedSet<T> must perform a binary search to find the correct location for the new element.
Comparative Performance of the collections
So, you might be bit confused, which collection we should use for which scenario. Here is a comparative time complexity of List, HashSet and SortedSet in the following basic operations:
How to use HashSet and SortedSet
Suppose we have a list of elements and we want to remove duplicates. Here is the following class that shows use of HashSet and SortedSet:
class DuplicatesRemover
{
List<int> ListOfItems = new List<int>() { 12,3,4,6,3,4,5,1,22,21,4,12,13,2,3,4,16,3,2,12,4,5,4,2,16 };
//Hash Set
HashSet<int> ItemSetWithoutDuplicates = null;
// Sorted Set
SortedSet<int> SortedItemsWithoutDuplicates = null;
public void PrintList()
{
Console.WriteLine("The elements in List are: \n");
foreach (int item in ListOfItems)
{
Console.Write(item + ", ");
}
}
public void CreateSets()
{
//We can create Hash Set/Sorted Set directly from a List during instatiation
// OR we could create them by copying elements from List one by one using for each loop
ItemSetWithoutDuplicates = new HashSet<int>(ListOfItems);
SortedItemsWithoutDuplicates = new SortedSet<int>(ListOfItems);
}
public void IteratingElementsInSets()
{
Console.WriteLine("\n\nThe elements in Hash Set are: \n");
foreach (int item in ItemSetWithoutDuplicates)
{
Console.Write(item + ", ");
}
Console.WriteLine("\n\nThe elements in Sorted Set are: \n");
foreach (int item in SortedItemsWithoutDuplicates)
{
Console.Write(item + ", ");
}
}
public void AddElementsInSets()
{
//Adding '42' in the List
Console.WriteLine("\n------------------------------------------------------------");
Console.WriteLine("\nAdding '42' in the List");
ItemSetWithoutDuplicates.Add(42);
SortedItemsWithoutDuplicates.Add(42);
Console.WriteLine("\n\nThe elements in Hash Set are: \n");
for (int i = 0; i < ItemSetWithoutDuplicates.Count; i++)
{
Console.Write(ItemSetWithoutDuplicates.ElementAt(i) + ", ");
}
Console.WriteLine("\n\nThe elements in Sorted Set are: \n");
for (int i = 0; i < SortedItemsWithoutDuplicates.Count; i++)
{
// SortedItemsWithoutDuplicates[i]
Console.Write(SortedItemsWithoutDuplicates.ElementAt(i) + ", ");
}
}
public void RemoveElementsFromSets()
{
//Removing '22' from the List
Console.WriteLine("\n------------------------------------------------------------");
Console.WriteLine("\nRemoving '22' from the List");
ItemSetWithoutDuplicates.Remove(22);
SortedItemsWithoutDuplicates.Remove(22);
Console.WriteLine("\n\nThe elements in Hash Set are: \n");
for (int i = 0; i < ItemSetWithoutDuplicates.Count; i++ )
{
Console.Write(ItemSetWithoutDuplicates.ElementAt(i) + ", ");
}
Console.WriteLine("\n\nThe elements in Sorted Set are: \n");
for (int i = 0; i < SortedItemsWithoutDuplicates.Count; i++)
{
// SortedItemsWithoutDuplicates[i]
Console.Write(SortedItemsWithoutDuplicates.ElementAt(i) + ", ");
}
//Removing all elements which are greater than '15'
Console.WriteLine("\n------------------------------------------------------------");
Console.WriteLine("\nRemoving all elements which are greater than '15' using Lambda Expression ");
ItemSetWithoutDuplicates.RemoveWhere(e => e > 15);
SortedItemsWithoutDuplicates.RemoveWhere(e => e > 15);
Console.WriteLine("\n\nThe elements in Hash Set are: \n");
for (int i = 0; i < ItemSetWithoutDuplicates.Count; i++)
{
Console.Write(ItemSetWithoutDuplicates.ElementAt(i) + ", ");
}
Console.WriteLine("\n\nThe elements in Sorted Set are: \n");
for (int i = 0; i < SortedItemsWithoutDuplicates.Count; i++)
{
// SortedItemsWithoutDuplicates[i]
Console.Write(SortedItemsWithoutDuplicates.ElementAt(i) + ", ");
}
}
}
On running the following Main Program, we get the output as:
class Program
{
static void Main(string[] args)
{
DuplicatesRemover oHashSetDemo = new DuplicatesRemover();
oHashSetDemo.PrintList();
oHashSetDemo.CreateSets();
oHashSetDemo.IteratingElementsInSets();
oHashSetDemo.AddElementsInSets();
oHashSetDemo.RemoveElementsFromSets();
Console.Read();
}
}
Thanks, Hemant Srivastava.