堆字典的延迟修复机制

在需要同时支持快速查找和优先级排序的场景中，如何设计一个兼顾两种操作效率的数据结构？今天我们来分析一个工业级**堆字典（Heap Dict）**实现，看看设计者如何通过延迟修复机制来平衡性能。

核心问题

传统的堆结构在优先级更新后需要 O(log N) 的时间复杂度来维护堆属性，而哈希表虽然查找 O(1)，但无法直接排序。如果需要同时支持：

按优先级获取最值
按 key 快速查找和更新

设计方案：哈希 + 堆的混合结构

template <class TKey, class TPriority, ...>
class THeapDict {
    // 双数据结构：哈希表维护位置，堆维护优先级顺序
    THashMap<TKey, size_t> PositionsInHeap;  // key -> heap position
    TVector<std::pair<TKey, TPriority>> Heap; // 堆存储
    
    TMaybe<size_t> ModifiedPosition;  // 延迟修复标记
};

核心设计：延迟修复

void ConsiderHeapifying() {
    if (!ModifiedPosition) {
        return;
    }
    Heapify(*ModifiedPosition);  // 惰性重组堆
    ModifiedPosition.Clear();
}

当元素被修改时，只记录位置，在下次访问堆时才真正修复堆结构。

权衡分析

优势

查找 O(1)：哈希表直接定位
延迟修复：减少不必要的堆调整
更新友好：修改优先级只需标记，无需立即重建

代价

空间开销：需要额外维护哈希表
复杂度：延迟修复增加代码复杂度
一致性：需要确保修改后及时访问

净室重构：Go 实现

package heapdict

import "container/heap"

type HeapItem struct {
    Key      string
    Priority int
}

type HeapDict struct {
    items       map[string]int      // key -> priority
    positionMap map[string]int      // key -> heap position
    heap        []HeapItem
}

func New() *HeapDict {
    return &HeapDict{
        items:       make(map[string]int),
        positionMap: make(map[string]int),
        heap:        make([]HeapItem, 0),
    }
}

func (h *HeapDict) Push(key string, priority int) {
    if _, exists := h.items[key]; exists {
        h.items[key] = priority
        // 标记需要修复（简化版本：直接重建）
        h.rebuild()
        return
    }
    
    h.items[key] = priority
    heap.Push(h, HeapItem{Key: key, Priority: priority})
}

func (h *HeapDict) Pop() HeapItem {
    item := heap.Pop(h).(HeapItem)
    delete(h.items, item.Key)
    delete(h.positionMap, item.Key)
    return item
}

func (h *HeapDict) Top() HeapItem {
    return h.heap[0]
}

func (h *HeapDict) Find(key string) (int, bool) {
    priority, exists := h.items[key]
    return priority, exists
}

func (h *HeapDict) rebuild() {
    // 延迟修复：批量重建堆
    h.heap = make([]HeapItem, 0, len(h.items))
    for key, priority := range h.items {
        h.heap = append(h.heap, HeapItem{Key: key, Priority: priority})
    }
    // 使用堆初始化
    for i := len(h.heap)/2 - 1; i >= 0; i-- {
        h.siftDown(i)
    }
}

func (h *HeapDict) Len() int { return len(h.heap) }
func (h *HeapDict) Less(i, j int) bool { return h.heap[i].Priority < h.heap[j].Priority }
func (h *HeapDict) Swap(i, j int) {
    h.heap[i], h.heap[j] = h.heap[j], h.heap[i]
    h.positionMap[h.heap[i].Key] = i
    h.positionMap[h.heap[j].Key] = j
}

func (h *HeapDict) PushItem(x interface{}) {
    item := x.(HeapItem)
    h.heap = append(h.heap, item)
    h.positionMap[item.Key] = len(h.heap) - 1
}

func (h *HeapDict) PopItem() interface{} {
    old := h.heap
    n := len(old)
    item := old[n-1]
    h.heap = old[0 : n-1]
    return item
}

总结

堆字典的延迟修复机制体现了惰性计算的经典思想：

混合数据结构：哈希表 + 堆，各取所长
延迟修复：修改只标记，访问时修复
权衡：空间换时间，减少不必要的计算

这种设计在需要频繁更新优先级的场景（如调度器、排行榜）中非常有效。