Concurrent Lock Policies in Index Merging

In the lifecycle of a search engine, "Merge" is a continuous background process. When a merge completes, the system needs to replace old index segments with the new, unified segment. This switching process seems simple, but in high-concurrency read scenarios, it is fraught with peril.

Without locking, readers might see inconsistent data or encounter files that have been deleted prematurely. However, if the lock granularity is too coarse or held for too long, it causes query "jitter" or even complete stalls.

Today, we explore several common locking strategies and simulate a "Late-Locking" (OnSwitch) policy using Modern C++ (C++20), demonstrating how to maximize system throughput while ensuring data safety.

The Strategy Game: OnStart vs. OnSwitch

When designing merge locks, we typically have two main choices:

OnStart Policy (Pessimistic Lock):
- Acquire the lock at the very beginning of the merge task to prevent other processes from modifying these segments.
- Pros: Simple implementation, high safety.
- Cons: Extremely low concurrency. Merging is a time-consuming process (potentially minutes), and holding a lock this long blocks all structural changes.
OnSwitch Policy (Optimistic/Late Lock):
- Do not hold a lock for the majority of the merge time (reading old segments, writing new ones to a temporary directory).
- Acquire the write lock only at the very last moment to "flip" the pointers.
- Pros: Drastically reduces critical section time, with almost zero impact on queries.
- Cons: Complex implementation. What if an old segment was deleted by another task during the merge? Requires complex reference counting or version checking mechanisms.

Modern C++ Implementation: RAII and Late-Locking

We will use C++20's std::shared_mutex to simulate a reader-writer lock and demonstrate how to implement a safe OnSwitch logic.

#include <iostream>
#include <vector>
#include <string>
#include <thread>
#include <mutex>
#include <shared_mutex>
#include <chrono>
#include <atomic>
#include <memory>

// Simulate an index segment
struct Segment {
    std::string id;
    size_t doc_count;
};

// Core state of the search system
class SearchIndex {
    std::vector<Segment> active_segments;
    mutable std::shared_mutex index_mutex; // Reader-Writer lock

public:
    SearchIndex() {
        active_segments = {{"seg_1", 100}, {"seg_2", 200}};
    }

    // Simulate query operation (Reader): Requires shared lock
    void search(const std::string& query) const {
        std::shared_lock lock(index_mutex);
        std::cout << "[Query] Searching for '" << query << "' in segments: ";
        for (const auto& seg : active_segments) {
            std::cout << seg.id << " ";
        }
        std::cout << std::endl;
        // Simulate query latency
        std::this_thread::sleep_for(std::chrono::milliseconds(50));
    }

    // Get a snapshot of current segments (preparation for merge)
    std::vector<Segment> get_segments_snapshot() const {
        std::shared_lock lock(index_mutex);
        return active_segments;
    }

    // Key: Atomic Switch (OnSwitch Policy)
    // Acquire exclusive lock only at this final step
    bool apply_merge_result(const std::vector<std::string>& old_ids, const Segment& new_merged_segment) {
        std::unique_lock lock(index_mutex); // Acquire write lock

        // Double-Check:
        // Ensure old segments still exist and haven't been deleted by other threads
        for (const auto& old_id : old_ids) {
            bool found = false;
            for (const auto& seg : active_segments) {
                if (seg.id == old_id) {
                    found = true;
                    break;
                }
            }
            if (!found) {
                std::cerr << "[Merge] Failed! Target segments disappeared." << std::endl;
                return false;
            }
        }

        // Execute Swap: Remove old segments, add new segment
        std::cout << "[Merge] Critical Section: Swapping segments..." << std::endl;
        auto it = active_segments.begin();
        while (it != active_segments.end()) {
            bool is_old = false;
            for (const auto& old_id : old_ids) {
                if (it->id == old_id) {
                    is_old = true;
                    break;
                }
            }
            if (is_old) {
                it = active_segments.erase(it);
            } else {
                ++it;
            }
        }
        active_segments.push_back(new_merged_segment);
        return true;
    }
};

// Simulate background merge task
void background_merge_task(SearchIndex& index) {
    // 1. Preparation Phase (No lock or short lock)
    auto snapshot = index.get_segments_snapshot();
    std::vector<std::string> target_ids;
    for (const auto& seg : snapshot) target_ids.push_back(seg.id);

    std::cout << "[Merge] Background merging started for " << target_ids.size() << " segments..." << std::endl;
    
    // Simulate expensive merge calculation (Holding NO locks!)
    std::this_thread::sleep_for(std::chrono::milliseconds(200)); 
    Segment new_seg{"seg_merged_v1", 300};

    // 2. Switch Phase (OnSwitch - Short Write Lock)
    if (index.apply_merge_result(target_ids, new_seg)) {
        std::cout << "[Merge] Success! Index updated." << std::endl;
    }
}

int main() {
    SearchIndex index;

    // Start background merge thread
    std::jthread merger(background_merge_task, std::ref(index));

    // Simulate foreground high-concurrency queries
    for (int i = 0; i < 5; ++i) {
        index.search("user_query_" + std::to_string(i));
        std::this_thread::sleep_for(std::chrono::milliseconds(60));
    }

    return 0;
}

Code Walkthrough

In this C++ example, we demonstrate the essence of the OnSwitch policy:

Lock-Free Calculation: The majority of the time in background_merge_task (the simulated sleep_for) is spent without holding any locks. This means query threads (search) can run unimpeded while the merge is progressing.
Atomic Switch: apply_merge_result uses std::unique_lock. This is the only place in the entire process that blocks readers, but the operations performed here are extremely fast (only in-memory vector manipulation), typically in the microsecond range.
Safety Check: After acquiring the write lock, we must verify the state of the environment again ("Double-Check"). Because during the lock-free computation period, the outside world might have changed (e.g., an admin might have deleted an old index).

Conclusion

In the design of high-concurrency systems, locking is not just about safety; it is an art of performance. The OnSwitch (Late-Locking) policy drastically improves system concurrency by minimizing the scope of critical sections.

Although implementing it is more complex than a global big lock—you need to handle state invalidation, version conflicts, etc.—in search systems that demand ultra-low latency, this complexity is worth it. Modern C++'s shared_mutex and RAII mechanisms allow us to express these complex synchronization logics more elegantly and safely.