Introduction
In my previous article on this topic, I demonstrated how we can create a
dictionary class (a 'Lexicon' class) that can have duplicate keys. This differs
from the dictionary classes in the .NET Framework itself which do not permit
duplicates.
I mentioned in the Conclusion to that article, that I was working on a number of
adjuncts to the Lexicon class which I am now presenting in this article.
The OrderedLexicon class
This is a Lexicon where the keys are sorted but the corresponding lists of
values are not. The values remain in the order they were originally added
(unless the order has been changed since) and so I decided on the name of
OrderedLexicon for this class.
OrderedLexicon<K, V> is a wrapper for a SortedDictionary<K, List<V>> and
contains mostly the same members as the Lexicon<K, V> class. However, as the
SortedDictionary class doesn't implement as many interfaces, or contain as many
constructors, as the generic Dictionary class then the OrderedLexicon class
reflects these omissions.
The main constructors it does contain are listed in the following table with a
brief description of what they do :
The Comparer property returns an IComparer rather than an IEqualityComparer as
we are now concerned with the sort order of the keys and not just their
equality.
The SortedLexicon class
This is a Lexicon in which both the keys and the corresponding lists of values
are sorted. A different IComparer can be used for each or you can use the
default comparers for the types in question.
Again, SortedLexicon<K, V> is a wrapper for a SortedDictionary<K, List< V>> and,
apart from a few minor differences, contains the same members as the
OrderedLexicon<K, V> class. In particular, the constructors are exactly the
same.
The Comparer property has been renamed KeyComparer in this class and there's
also a ValueComparer property which has a (private) valueComparer backing field.
All the methods and properties of the SortedLexicon class ensure that the sort
order of the value lists is maintained when items are added, inserted, removed,
or changed.
However, because the value lists are not copied when they are accessed, it is
possible for the user to 'manually' change the order of the items in the list
and so it is not therefore feasible for this class to guarantee that the values
will always emerge in sorted order.
The alternative would have been to always copy or always re-sort the lists
before they are accessed but I felt that the cost of this would have been far
too expensive in memory and/or processing time to solve what is a relatively
minor problem.
If you do want to manipulate the lists manually, you should therefore either
ensure that the correct order of values in maintained or call this additional
method to do it for you:
Lexicons where duplicate keys are relatively sparse
As mentioned in the previous article the Lexicon class, as currently
implemented, is not ideal for dealing with situations where there are far less
duplicate keys than single keys. This is because the value for a single key
still has to be wrapped within a list whose internal array is given a capacity
of four when the first element is added. This inevitably means that a Lexicon
will use a lot more memory than a Dictionary would for single keys, the only
type of key the latter can accommodate.
I have therefore been considering alternative ways in which the Lexicon could be
implemented to avoid or reduce the impact of this problem.
One way would to modify the key so that, instead of just the key itself, a tuple
consisting of both the key and the index of a particular value would be used.
There would then only be a single value for each modified key. However, for this
approach to be remotely efficient, you need to maintain a second dictionary so
you know how many values there are for each key. Even then you still need 'n +
1' dictionary lookups to get all 'n' values for a given key whereas only one lookup is
needed for the Lexicon. The 'housekeeping' for this approach, when values are
inserted or deleted, is also more onerous.
As it's not very clear how much memory the 'tuple' approach would save at the
expense of more processing time, I decided not to pursue it further.
A better idea would be store only the duplicate keys in lists. To do that you
would need two internal dictionaries: one for single values and one for
duplicate values. There would still be additional 'housekeeping' needed to move
keys between the dictionaries as values are added or removed but I felt this was
manageable.
The problems came when I tried to implement the 'two dictionaries' approach. It
simply wasn't possible to do it in a natural, unified way. As you have no idea
in advance whether a key will have duplicate values, it is necessary to check
this first and then construct a list consisting of all the values. You also need
different methods depending on whether you are returning a single value or a
list. Moreover, this class doesn't sit well with the existing dictionary and
lexicon classes; it is neither one nor the other and so can't sensibly implement
either the generic IDictionary or ILexicon interfaces.
On balance, I think it is therefore best to leave it to the user to make any
necessary separation between single and duplicates keys. In many situations, you
simply won't know in advance whether the duplicate keys will be sparse or not
and it is therefore better to load the data into a Lexicon initially. If the
duplicate keys are then sparse, you can leave them in the Lexicon but move as
many single keys as possible out to a separate Dictionary to save memory.
To assist with this, I've added two new members to all three Lexicon classes:
Lexicons with empty lists of values
The initial implementation of the Lexicon class didn't deal too gracefully with
these by assuming they were always errors and throwing exceptions in some
circumstances. In practice, an empty list might arise temporarily if the user is
manipulating lists directly though otherwise it probably is an error.
It is not, in any case, feasible to prevent empty lists from arising when a user
is manipulating the lists directly because (as discussed in the SortedLexicon
section) the cost of copying the lists before they are accessed would be just
too high.
I have therefore 'tightened up' the code of all the Lexicon classes to ensure
that empty lists cannot arise if you confine yourself to the methods and
properties of those classes. Otherwise empty lists are regarded as legitimate
and shouldn't cause any more problems than non-empty lists.
The following methods have been added to reduce the need for direct
manipulations and to remove empty lists which may have been forgotten about:
Notice also, that if a key does not exist in the Lexicon, the GetValueCount
method now returns -1 rather than 0. This is to distinguish it from an existing
empty key which still returns 0.
ToLexicon() extension method
For those using .NET 3.5 or later, I've added a ToLexicon() extension method for
use in LINQ queries. This is analogous to the Enumerable.ToDictionary()
extension method in the .NET framework except that it does not throw an
exception if multiple values are generated for the same key.
As in the case of ToDictionary(), there are four overloads of this method.
Example of usage
The code for all three Lexicon classes, the ILexicon interface (unchanged since
the previous article) and the ToLexicon() extension method can be downloaded
from the link accompanying this article as it is far too long to include in the
body of the article itself. All these types are included in a namespace called
Lexicons and, apart from the extension method, can be used in .NET 2.0 or later.
The following console application contrasts the three Lexicon classes and also
contains sample code for the new members:
using
System;
using
System.Collections;
using
System.Collections.Generic;
using
Lexicons;
class
City
{
private
string name;
private
string country;
public
string Name { get
{ return name; } }
public
string Country { get
{ return country; } }
public City(string
name, string country)
{
this.name = name;
this.country = country;
}
}
class
Program
{
static
void Main(string[]
args)
{
Console.WriteLine("Unordered\n");
Lexicon<string,
int> lex = new
Lexicon<string,
int>();
lex.Add("Dave", 6);
lex.Add("John", 1);
lex.Add("Dave", 3);
lex.Add("Stan", 4);
lex.Add("Dave", 2);
lex.Add("Fred", 5);
foreach (var
kvp in lex)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
Console.WriteLine("\nOrdered
by key\n");
OrderedLexicon<string,
int> olex = new
OrderedLexicon<string,
int>(lex);
foreach (var
kvp in olex)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
Console.WriteLine("\nOrdered
by key, then by value\n");
SortedLexicon<string,
int> slex = new
SortedLexicon<string,
int>(lex);
foreach (var
kvp in slex)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
Console.WriteLine();
// create
an empty key
slex["John"].Clear();
// mess
up the order of another key's values
slex["Dave"].Insert(0,
7);
//
analyze keys
int m, s, e;
int t = slex.AnalyzeKeys(out
m, out s, out e);
Console.WriteLine("\nMultiple
{0}, Single {1}, Empty {2}\n", m, s, e);
// remove
empty keys
slex.RemoveEmptyKeys();
//
transfer single keys to a Dictionary
Dictionary<string,
int> dict = new
Dictionary<string,
int>();
slex.MoveSingleKeys(dict);
Console.WriteLine("\nOrdered
by key, then by value, after some removals\n")
foreach (var
kvp in slex)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
//
reorder "Dave"
slex.SortValueList("Dave");
Console.WriteLine("\nOrdered
by key, then by value, after some removals and resorting\n");
foreach (var
kvp in slex)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
// create
a list of City objects
List<City>
cities = new List<City>();
cities.Add(new
City("Delhi",
"India"));
cities.Add(new
City("Madrid",
"Spain"));
cities.Add(new
City("New York",
"U.S.A"));
cities.Add(new
City("Mumbai",
"India"));
// create
a new lexicon from the list of cities
Lexicon<string,
string> lex2 = cities.ToLexicon(c => c.Country,
c => c.Name)
Console.WriteLine("\nNew
lexicon created from list of cities\n");
foreach (var
kvp in lex2)
{
Console.WriteLine("{0}
: {1}", kvp.Key, kvp.Value);
}
Console.ReadKey();
}
}
Example output
A screenshot of the output is shown below:
Conclusion
As mentioned in the Conclusion to the original article, the Lexicon classes are
not intended to be replacements for the generic Dictionary and SortedDictionary
classes and should only be used in situations where keys may be duplicated.
I hope you will find them to be useful in these circumstances.