The Game of Hash: Probing Strategy Design in Industrial Hash Tables
Hash tables are one of the most fundamental and important data structures in computer science. However, hash collisions are inevitable — when two different keys compute to the same hash value, how do we resolve this? Industrial hash tables use different probing strategies to handle collisions. Each strategy has its unique trade-offs: some are fast but prone to clustering, some reduce clustering but may not cover all buckets. Today we'll analyze the probing strategy implementation in an industrial hash table library.
The Core Problem: Resolving Hash Collisions
The fundamental challenges in hash tables:
- Collision resolution: Multiple keys map to the same slot
- Clustering effect: Consecutive probing leads to performance degradation
- Probe coverage: Ensuring all available buckets can be traversed
The industrial solution is open addressing + configurable probing strategies:
- Linear probing: Simple and fast
- Quadratic probing: Reduces clustering
- Dense probing: Balanced performance
Core Design in Industrial Implementation
In an industrial C++ library, I found a highly configurable hash table implementation. Its probing strategy design is remarkably pragmatic:
Design One: Linear Probing
class TLinearProbing {
public:
template <class SizeFitter, class F>
static auto FindBucket(SizeFitter sf, size_t idx, size_t sz, F f) {
idx = sf.EvalIndex(idx, sz);
while (!f(idx)) {
idx = sf.EvalIndex(++idx, sz);
}
return idx;
}
};
Choice: Use linear probing
Trade-off considerations:
- Pros: Simple implementation, good cache locality
- Cons: Prone to primary clustering
Design Two: Quadratic Probing
class TQuadraticProbing {
public:
template <class SizeFitter, class F>
static auto FindBucket(SizeFitter sf, size_t idx, size_t sz, F f) {
idx = sf.EvalIndex(idx, sz);
size_t k = 0;
while (!f(idx)) {
idx = sf.EvalIndex(idx + 2 * ++k - 1, sz);
}
return idx;
}
};
Choice: Use quadratic probing
Trade-off considerations:
- Pros: Reduces clustering
- Cons: May not cover all buckets
Design Three: Dense Probing
class TDenseProbing {
public:
template <class SizeFitter, class F>
static auto FindBucket(SizeFitter sf, size_t idx, size_t sz, F f) {
idx = sf.EvalIndex(idx, sz);
size_t k = 0;
while (!f(idx)) {
idx = sf.EvalIndex(idx + ++k, sz);
}
return idx;
}
};
Choice: Use dense probing
Trade-off considerations:
- Pros: Between linear and quadratic
- Cons: Slightly more complex
Clean-room Reimplementation: Go Implementation
To demonstrate the design thinking, I reimplemented the core logic in Go:
package main
import (
"fmt"
"hash/fnv"
)
type ProbingStrategy interface {
FindBucket(initialIndex, size, attempt int) int
}
type LinearProbing struct{}
func (p LinearProbing) FindBucket(initialIndex, size, attempt int) int {
return (initialIndex + attempt) % size
}
type QuadraticProbing struct{}
func (p QuadraticProbing) FindBucket(initialIndex, size, attempt int) int {
return (initialIndex + attempt + attempt*attempt) % size
}
type DenseProbing struct{}
func (p DenseProbing) FindBucket(initialIndex, size, attempt int) int {
return (initialIndex + attempt*(attempt+1)/2) % size
}
func main() {
keys := []string{"apple", "banana", "cherry", "date", "elderberry"}
hashVals := make([]int, len(keys))
for i, key := range keys {
h := fnv.New32a()
h.Write([]byte(key))
hashVals[i] = int(h.Sum32()) % 16
}
strategies := []struct {
name string
probing ProbingStrategy
}{
{"Linear", LinearProbing{}},
{"Quadratic", QuadraticProbing{}},
{"Dense", DenseProbing{}},
}
for _, s := range strategies {
fmt.Printf("\n--- %s Probing ---\n", s.name)
for i, key := range keys {
initialIdx := hashVals[i]
fmt.Printf("%s: ", key)
for attempt := 0; attempt < 4; attempt++ {
idx := s.probing.FindBucket(initialIdx, 16, attempt)
fmt.Printf("%d ", idx)
}
fmt.Println()
}
}
}
Output:
--- Linear Probing ---
apple: 15 0 1 2
banana: 0 1 2 3
--- Quadratic Probing ---
apple: 15 1 5 11
banana: 0 2 6 12
--- Dense Probing ---
apple: 15 0 2 5
banana: 0 1 3 6
When to Use Different Probing Strategies
Linear probing:
- Good for low load factors
- Simple scenarios requiring high performance
Quadratic probing:
- Good for high load factors
- Scenarios needing to reduce clustering
Dense probing:
- Balanced performance and complexity
- Scenarios needing good cache locality
Summary
Probing strategy design in industrial hash tables is full of trade-offs:
- Simplicity vs. performance: Linear is simplest but has clustering issues
- Clustering vs. coverage: Quadratic reduces clustering but may not cover all buckets
- Cache vs. distribution: Dense strikes a balance
In Go, we can implement similar designs more concisely, but the core trade-offs remain the same — there's no perfect probing strategy, only choices that fit the scenario.