As far as .NET generic collections go, you’ve pretty much got two basic types. There’s List<T>, which gives you an array-style type-safe collection of objects. And then there’s Dictionary<tkey, tval> (and a number of other related types), which you can use for hash-style type-safe collections. But what if you want a collection of objects that you access by key and index? Or, more subtly, take this example:
class Thing {
private string txt;
public Thing(string text)
{
txt = text;
}
public override string ToString()
{
return txt;
}
}
class Stuff : Thing
{
public Stuff(string text) : base(text) { }
}
class Program
{
static void Main(string [] args)
{
Dictionary<Stuff, Thing> stuffAndThings = new Dictionary<Stuff, Thing>();
stuffAndThings.Add(new Stuff("1: "), new Thing("One Thing"));
stuffAndThings.Add(new Stuff("2: "), new Thing("Two Thing"));
stuffAndThings.Add(new Stuff("3: "), new Thing("Red Thing"));
stuffAndThings.Add(new Stuff("4: "), new Thing("Blue Thing"));
foreach (KeyValuePair<Stuff, Thing> pair in stuffAndThings)
{
Console.WriteLine(pair.Key.ToString() + pair.Value.ToString());
}
}
Which Outputs:
1: One Thing
2: Two Thing
3: Red Thing
4: Blue Thing
Right? Well, maybe not…
In all the MSDN descriptions of .NET Generic hash-type collections we see this perfectly clear little statement “The order in which the items are returned is undefined.” Fantastic. Although I’ve never seen items returned in any order other than the one they were entered, I guess we can’t rely on it. If anyone has any details of how/why/when items will be returned in any other order, I’d love to hear it.
So, now we’re looking for a class that will store key-value pairs that’s accessible by index (which also implies it will return items in the order entered). There is the System.Collections.Specialized.NameValueCollection class, which does exactly this, but only for string-string pairs. Unfortunately there is no .NET framework class to do this generically. The closest thing is the KeyedCollection<tkey, tval> class, which is abstract (and has some other interesting qualities that we’ll look at later), and the IOrderedDictionary class, which, as you can probably guess, is just an interface. So it looks like we’re up to writing our own class. Or search around and see if someone’s already done it for us (of course they have!).
After a bit of searching I found a couple implementations that do the job nicely, and discuss some of the performance plusses and minuses.
MarcClifton.com – KeyedList<tkey, tval>
CommonGenius.com – OrderedDictionary<tkey, tval>
It turns out that these two implementations are pretty similar.
The KeyedList class implements IDictionary<TKey, TVal> and IList<KeyValuePair<TKey, TVal>>.
The OrderedDictionary class implements IOrderedDictionary and IDictionary<TKey, TVal>.
Both classes have an internal Dictionary<Tkey, TVal> field and List<KeyValuePair<TKey, TVal>> field for storing all the data. This means that all the data stored by this class is doubled (or at least the number of references to each piece of data is doubled). Additionally, all of the add and remove actions need to be applied to each internal object, so the time for these operations is increased. As David Nelson (the OrderedDictionary<tkey, tval> author) points out, “OrderedDictionary should only be used when insert/delete operations will be minimal, AND there is a need to efficiently access elements by index and/or key.”
It feels like there should be an easier or better way to do this. Or at least somehow not have to store two separate datasets. One thing to note is that in these implementations, all the necessary data is stored, in order, in a List<KeyValuePair<TKey, TVal>>. So a very simple option would be to just use this list of pairs. For exmaple:
List<KeyValuePair<Stuff, Things>> stuffAndThings =
new List<KeyValuePair<Stuff, Things>>
Or if you want a new type for it:
public class PairList<TKey, TVal> : List<KeyValuePair<TKey, TVal>
That’s it! Now you have a “leaner” implementation of ordered key-value pairs. But you can’t look things up by key. And there’s nothing to validate that the keys are all unique. And you have to use it like this:
PairList<Stuff, Things> stuffAndThings = new PairList<Stuff, Things>
stuffAndThings.Add(new KeyValuePair<Stuff, Things>("stuff","things"));
Ew. This isn’t at all what we were talking about, but now is a good time to really think about what you actually need to do. For our initial example (putting together a list of key-value pairs and getting them out in order) this does just fine. And you can solve the ugly ‘Add’ problem easily by creating a new Add method:
public void Add(TKey key, TValue value)
{
base.Add(new KeyValuePair<TKey,TValue>(key, value));
}
Not bad.
But if we want to add the ability to look up by key, we have to do something like this:
public List<TKey> Keys
{
get
{
List<TKey> keys = new List<TKey>();
foreach(KeyValuePair<TKey, TValue> kvp in this)
{
keys.Add(kvp.Key);
}
return keys;
}
}
public TValue this[TKey key]
{
get
{
return this[Keys.IndexOf(key)].Value;
}
set
{
this[Keys.IndexOf(key)] = new KeyValuePair<TKey, TValue>(key, value);
}
}
Now it’s getting ugly. Especially because every time you access by key you have to generate a new list of keys. Super slow. It definitely looks like this is not the road to take for a true generic “OrderedDictionary” kind of functionality.
So lets look back at this abstract KeyedCollection<tkey, tval> class.
If you derive this class you have to implement:
protected override tkey GetKeyForItem(tval item)
Basically this means that it expects you to be using objects where the key is stored as part of the object (or can be systematically derived from it). This seems annoying at first, but might be a really good idea – you always have the object’s key value any time you need to access the object.
In particular, we can genericize things a bit to make it so we don’t have to create separate concrete types for every kind of thing we want to store:
public interface IHasKey<TKey>
{
TKey Key
{
get ;
set ;
}
}
public class KeyedObjectCollection<TKey, TVal> :
System.Collections.ObjectModel.KeyedCollection<TKey, TVal>
where TVal : IHasKey <TKey>
{
protected override TKey GetKeyForItem(TVal item)
{
return item.Key;
}
public void Add(TKey key, TVal item)
{
item.Key = key;
base.Add(item);
}
}
Lets try this out with our stuffAndThings example:
class KeyedThing: Thing, IHasKey< string>
{
public string key;
public KeyedThing(string keyText, string valueText)
: base(valueText)
{
key = keyText;
}
public string Key
{
get { return key; }
set { key = value ; }
}
}
class Program
{
static void Main(string [] args)
{
KeyedObjectCollection<string, KeyedThing> stuffAndThingsTwo =
new KeyedObjectCollection<string, KeyedThing>();
stuffAndThingsTwo.Add( new KeyedThing ("1: ", "One Thing"));
stuffAndThingsTwo.Add( new KeyedThing ("2: ", "Two Thing"));
stuffAndThingsTwo.Add( new KeyedThing ("3: ", "Red Thing"));
stuffAndThingsTwo.Add( new KeyedThing ("4: ", "Blue Thing"));
for (int i = 0; i < stuffAndThingsTwo.Count; i++)
{
Console.WriteLine(stuffAndThingsTwo[i].Key +
stuffAndThingsTwo[i].ToString());
}
}
}
Not bad…except we have to declare the key type for the KeyedObjectCollection as string, even though the KeyedThing class inherently require’s its key type to be string. That gets a bit redundant.
It turns out that’s fixable by being even more generic:
public class KeyedCollection<tkey, tval> :
System.Collections.ObjectModel.KeyedCollection<tkey, KeyValuePair<tkey, tval>>
{
protected override tkey GetKeyForItem(KeyValuePair<tkey, tval> item)
{
return item.Key;
}
public void Add(TKey key, TVal item)
{
item.Key = key;
base.Add(item);
}
}
Now when you access the collection by key, it will return a KeyValuePair. This will probably be OK, but it also means you have to add everything by KeyValuePair, so there are a few extra methods we may want to add:
public ICollection<tkey> Keys
{
get { return base.Dictionary.Keys; }
}
public ICollection<tval> Values
{
get
{
ICollection <tval> values = new List<tval>();
foreach(KeyValuePair<tkey, tval> kvp in base.Dictionary.Values)
{
values.Add(kvp.Value);
}
return values;
}
}
public void Add(tkey key, tval val)
{ base.Add( new KeyValuePair<tkey, tval>(key, val)); }
public void Insert(int index, tkey key, tval val)
{ base.Insert(index, new KeyValuePair<tkey, tval>(key, val)); }
public int IndexOf(tkey key)
{
for (int i = 0; i < base.Count; i++)
{
if (this[i].Key.Equals(key))
return i;
}
return -1;
}
Here we see that the base abstract KeyedCollection class contains internal Dictionary (base.Dictionary) and internal List (base.Items) objects. It’s looking quite familiar to what we were doing with the OrderedDictionary and KeyedList, but the Dictionary in the KeyedCollection class is storing Keys and KeyValuePairs instead of just Keys and Values, which is not ideal.
However, there’s a lot less code and room for error. And, for better or worse, it’s using a lot more functionality provided in the .NET framework class library. Why might this be good?
Looking at the OrderedDictionary implementation, there’s this property:
private List<KeyValuePair<TKey, TValue>> List
{
get
{
if (null == List)
{
_list = new List<KeyValuePair <TKey,TValue>>(_initialCapacity);
}
return _list;
}
}
Obviously it’s supposed to be if(null == _list) so it’s just a typo. But it’s one that doesn’t get caught until you actually run some code that calls the right methods on the object. This wouldn’t have happened if Microsoft had given you the class you needed.
After all debugging and testing is done, the OrderedDictionary and KeyedCollection classes are much cleaner implementations of a generic class for collections that is accessible by key and by index. However, there are times when the simple List<KeyValuePair<tkey, tval>> does everything we need, and others when some particular implementation of KeyedCollection may do the job best. This just point out again how a proper design process may save us a lot of refactoring down the road.