Optimization #1

Robin Hood Hashing

Our first major optimization attempt replaced the standard C++ container with a custom open-addressing hash table designed to minimize variance in probe sequence lengths.

Hashing Fundamentals

In the baseline solution, the hash table is implemented using C++/STL's std::unordered_map.

Hashing is a technique that maps data to a fixed-size array using a hash function. This function takes an input (key) and returns an index in the hash table where the corresponding value is stored. This makes searching and accessing elements extremely efficient.

The core challenge is Collision Handling: managing scenarios where multiple keys hash to the same index. Our optimization uses advanced open addressing to solve this.

The Goal

Minimize the variance in the distance between every key and its initial "home" slot.

Variance Efficiency

"Rob from the rich to give to the poor"

The Robin Hood Algorithm

In Robin Hood Hashing, when a collision occurs, the algorithm ensures that keys are placed as close to their ideal position as possible. If a new key is inserted further from its home position than an existing key, it displaces the current key.

This process effectively "robs" keys that are close to their ideal positions (the "rich") to make room for keys that are far away (the "poor"), minimizing the overall distance and improving the hash table's efficiency.

Probe Sequence Length (PSL)

PSL refers to the number of steps required to find a key in the hash table, especially when collisions occur. While average search time in linear probing is \( O(1) \), the actual steps can vary significantly.

b[p].v → Value stored in bucket \( p \)
b[p].psl → PSL of value in bucket \( p \)

Insertion Logic

Start at the bucket where value \( v \) hashes to.

If bucket is occupied, compare PSL. If current_PSL < new_PSL, swap values. The new value takes the spot, the old value continues searching.

Continue checking subsequent buckets until an empty (null) bucket is found.

Insert the final displaced value into the empty bucket.

Visualization

00:00

Visualization of the Robin Hood hashing algorithm.

The Result

By continuously swapping keys based on their PSL values, the algorithm prevents clustering and ensures no key has an unreasonably long probe sequence. This keeps the table balanced, ensuring efficient lookups and minimizing the maximum distance any key travels from its ideal position.