Heap Dict: The Art of Delayed Repair
In scenarios requiring both fast lookup and priority ordering, how do we design a data structure that balances both operations efficiently? Today we analyze an industrial-grade Heap Dict implementation to understand how designers use delayed repair to balance performance.
The Core Problem
Traditional heaps require O(log N) to maintain heap property after priority updates, while hash tables provide O(1) lookup but can't sort directly. When we need:
- Get extremum by priority
- Fast lookup and update by key
Design: Hash + Heap Hybrid
template <class TKey, class TPriority, ...>
class THeapDict {
// Dual data structure: hash for position, heap for priority
THashMap<TKey, size_t> PositionsInHeap; // key -> heap position
TVector<std::pair<TKey, TPriority>> Heap; // heap storage
TMaybe<size_t> ModifiedPosition; // delayed repair marker
};
Core Design: Delayed Repair
void ConsiderHeapifying() {
if (!ModifiedPosition) {
return;
}
Heapify(*ModifiedPosition); // lazy rebuild
ModifiedPosition.Clear();
}
When an element is modified, only the position is marked. The heap is actually repaired only when accessed next time.
Trade-off Analysis
Advantages
- O(1) Lookup: Hash table for direct positioning
- Delayed Repair: Reduces unnecessary heap adjustments
- Update Friendly: Priority changes only need marking, no immediate rebuild
Costs
- Space Overhead: Additional hash table maintenance
- Complexity: Delayed repair increases code complexity
- Consistency: Need to ensure timely access after modification
Clean Room Reimplementation: Go Implementation
package heapdict
import "container/heap"
type HeapItem struct {
Key string
Priority int
}
type HeapDict struct {
items map[string]int
positionMap map[string]int
heap []HeapItem
}
func New() *HeapDict {
return &HeapDict{
items: make(map[string]int),
positionMap: make(map[string]int),
heap: make([]HeapItem, 0),
}
}
func (h *HeapDict) Push(key string, priority int) {
if _, exists := h.items[key]; exists {
h.items[key] = priority
h.rebuild()
return
}
h.items[key] = priority
heap.Push(h, HeapItem{Key: key, Priority: priority})
}
func (h *HeapDict) Pop() HeapItem {
item := heap.Pop(h).(HeapItem)
delete(h.items, item.Key)
delete(h.positionMap, item.Key)
return item
}
func (h *HeapDict) rebuild() {
h.heap = make([]HeapItem, 0, len(h.items))
for key, priority := range h.items {
h.heap = append(h.heap, HeapItem{Key: key, Priority: priority})
}
for i := len(h.heap)/2 - 1; i >= 0; i-- {
h.siftDown(i)
}
}
Summary
The delayed repair mechanism in Heap Dict embodies the classic lazy evaluation principle:
- Hybrid Data Structure: Hash table + heap, combining strengths
- Delayed Repair: Mark on modification, repair on access
- Trade-off: Space for time, reducing unnecessary computation
This design is particularly effective in scenarios requiring frequent priority updates (like schedulers, leaderboards).