CTRL + S our souls
CTRL + S our souls

Optimizing Hash Joins

Official documentation for the SIGMOD 2025 Programming Contest entry. A deep dive into high-performance computing techniques for main-memory database operations.

The Team

Andreakis Dimitrios

Andreakis Dimitrios

EncodedMind

View Profile
Vasileiou Evaggelos

Vasileiou Evaggelos

VangelisVas

View Profile
Kolokouras Apostolos

Kolokouras Apostolos

TolisKlk

View Profile

The Problem

The SIGMOD 2025 Programming Contest task is to implement a high-performance Hash Join algorithm.

The contest provides a baseline implementation of a Hash Join. The primary goal is to optimize this code to minimize the total execution time of the algorithm. Participants are free to use any optimization techniques available, including parallelization, vectorization, and cache-conscious algorithms.

Base Implementation

The starting point for our optimizations.

SIGMOD-25-Programming-Contest/base

Performance Results

Comparison of average execution time across all optimization stages.

Optimization Evolution

Lower is better. Y-Axis: Execution Time (ms)

Average Runtimes & Improvements

5 runs averaged
Optimization Avg Time (ms) vs Previous vs Base
0. Base (Unordered Map) 160,048 - -
1. Robinhood HT 183,942 +14.93% +14.93%
2. Cuckoo HT 178,869 +11.76% +11.76%
3. Hopscotch HT 173,872 +8.64% +8.64%
4. Late Materialization 105,546 -34.05% (vs Base) -34.05%
5. Column Store 62,144 -41.12% -61.17%
6. No Root IR 58,901 -5.22% -63.20%
7. Unchained Table 43,311 -26.47% -72.94%
8. Indexing 13,017 -69.95% -91.87%
9. Building Parallelization 12,919 -0.75% -91.93%
10. Probing Parallelization 7,798 -39.64% -95.13%
11. Work Stealing (Final) 6,208 -20.39% -96.12%

Final Execution Breakdown (Fastest)

Optimization 11: Work Stealing

Need help understanding our code? 🤖