Optimizing Hash Joins
Official documentation for the SIGMOD 2025 Programming Contest entry. A deep dive into high-performance computing techniques for main-memory database operations.
The Team
Andreakis Dimitrios
EncodedMind
Vasileiou Evaggelos
VangelisVas
Kolokouras Apostolos
TolisKlk
The Problem
The SIGMOD 2025 Programming Contest task is to implement a high-performance Hash Join algorithm.
The contest provides a baseline implementation of a Hash Join. The primary goal is to optimize this code to minimize the total execution time of the algorithm. Participants are free to use any optimization techniques available, including parallelization, vectorization, and cache-conscious algorithms.
Simplified Explanation
Performance Results
Comparison of average execution time across all optimization stages.
Optimization Evolution
Lower is better. Y-Axis: Execution Time (ms)
Average Runtimes & Improvements
5 runs averaged| Optimization | Avg Time (ms) | vs Previous | vs Base |
|---|---|---|---|
| 0. Base (Unordered Map) | 160,048 | - | - |
| 1. Robinhood HT | 183,942 | +14.93% | +14.93% |
| 2. Cuckoo HT | 178,869 | +11.76% | +11.76% |
| 3. Hopscotch HT | 173,872 | +8.64% | +8.64% |
| 4. Late Materialization | 105,546 | -34.05% (vs Base) | -34.05% |
| 5. Column Store | 62,144 | -41.12% | -61.17% |
| 6. No Root IR | 58,901 | -5.22% | -63.20% |
| 7. Unchained Table | 43,311 | -26.47% | -72.94% |
| 8. Indexing | 13,017 | -69.95% | -91.87% |
| 9. Building Parallelization | 12,919 | -0.75% | -91.93% |
| 10. Probing Parallelization | 7,798 | -39.64% | -95.13% |
| 11. Work Stealing (Final) | 6,208 | -20.39% | -96.12% |
Final Execution Breakdown (Fastest)
Optimization 11: Work Stealing