Cuckoo Hashing
Our second optimization approach involved a multiple-choice hashing scheme designed to provide worst-case constant lookup time.
Hashing Fundamentals
In the baseline solution, the hash table is implemented using C++/STL's std::unordered_map.
Hashing is a technique that maps data to a fixed-size array using a hash function. This function takes an input (key) and returns an index in the hash table where the corresponding value is stored. This makes searching and accessing elements extremely efficient.
The core challenge is Collision Handling: managing scenarios where multiple keys hash to the same index. Cuckoo hashing solves this by providing multiple potential homes for each key.
The Concept
Named after the cuckoo bird, which pushes other birds' eggs out of nests to make room for its own.
The Cuckoo Algorithm
The basic idea is that instead of having a single hash table \( T \) and one hash function \( h \) (as in the classic case), we use two tables \( T_1, T_2 \) and two hash functions \( h_1, h_2 \).
A key \( k \), instead of having one position \( T[h(k)] \), will have two potential locations: it will reside either at \( T_1[h_1(k)] \) or at \( T_2[h_2(k)] \).
Conflict Resolution
Suppose we want to insert an element \( k_1 \).
-
1
We always attempt to place \( k_1 \) in its position in the first table, \( T_1[h_1(k_1)] \). If this spot is empty, we are done.
-
2
If occupied by another element \( k_2 \) (where \( h_1(k_1) = h_1(k_2) \)), we insert \( k_1 \) anyway, essentially evicting \( k_2 \).
-
3
\( k_2 \) must now move to the other table, specifically to position \( T_2[h_2(k_2)] \). If that is empty, we are done.
-
4
If \( T_2[h_2(k_2)] \) is occupied by \( k_3 \), we evict \( k_3 \) back to the first table. This "kicking" process continues until an empty slot is found.
Rehashing & Cycles
There are two scenarios where a full rehash is required:
Visualization
Visualization of the Cuckoo hashing algorithm showing the displacement chain.