Friday, October 15, 2010

HashMap Vs. TreeMap

This is one of my favorite programming interview questions: "What's the difference between Hash Tables and Binary Search Trees?" (or: "HashMap vs. TreeMap"). Let's see...


Hash Tables (HashMap)Binary Search Trees (TreeMap)
AlgorithmKeys are mapped to values by using hash functions.

Hash functions transform the key to a numeric index (usually by calculating an integer value from the key first, and then applying a "modulo arraysize" operation). This index identifies an array element ("bucket"), where all corresponding values reside (values which all stem from keys with equal hash value).

The key must still be looked up within the bucket, but as hashing algorithms are supposed to ensure a good hash value distribution, hence small bucket size, this lookup is very fast.

The bucket array may have to be resized dynamically when a certain load factor is reached.
Node-based tree data structure, where every node contains one record (key-only, or key and value), and has a left subtree (with smaller keys) and a right subtree (with larger keys); hence keys must be comparable.

Self-balancing trees, like red-black-trees, keep their height as small as possible for fast searching and inserting.
Java ImplementationsHashtable
HashMap
HashSet
TreeMap
.NET ImplementationsHashtable
Dictionary
HashSet
SortedDictionary
SortedList
C++ STL Implementationsstd::unordered_map
std::unordered_set
std::map
SortedNoYes
Range SearchNoYes
Runtime ComplexityInserting: O(1) (normal case), O(N) (worst case, only with bad hash algorithm)

Searching: O(1) (normal case), O(N) (worst case, only with bad hash algorithm)

More details
Inserting: O(log(n)) (when balanced)

Searching: O(log(n)) (when balanced)

More details
Implementation NotesEqual objects must produce the same hash code, however unequal objects need not produce distinct hash codes. Equals() implementations must be reflexive, symmetric, transitive, consistent. More details.

.NET's ValueType.Equals() and GetHashCode() methods are based on reflection, hence slow; you should provide your own implementation in structs.
CompareTo() implementations must be reflexive, symmetric, transitive, consistent. More details.