Ticket Locks: The Architecture of Predictability

In the deep waters of systems programming, raw speed is often less important than predictable latency.

Standard spinlocks, typically built around atomic Compare-and-Swap (CAS) loops, suffer from a critical flaw under high contention: unfairness. When a lock is released, the hardware arbiter decides who gets it next based on processor topology or sheer luck, not waiting time. This leads to starvation and erratic P99 latency spikes.

Enter the Ticket Lock. It trades the chaotic scramble of CAS for the orderly discipline of a FIFO queue, ensuring that every thread gets its turn.

1. The Deli Counter Model

The mechanism is simple, mirroring a physical "take a number" system:

next_ticket: The dispenser. Threads atomically fetch-and-increment this value to get their position in line.
current_ticket: The display. Shows who is currently being served.

Threads spin on current_ticket until it matches their hand-held number. This guarantees strict FIFO ordering—no one cuts in line.

2. Zig Implementation: Explicit Memory Layout

One of Zig’s superpowers is its explicit control over memory alignment. In high-performance locking, where variables sit in memory matters as much as the algorithm itself.

Here is a production-grade implementation that respects CPU cache lines:

const std = @import("std");
const atomic = std.atomic;

/// TicketLock: A fair synchronization primitive
/// Enforces strict FIFO ordering via dual counters
pub const TicketLock = struct {
    // Critical Optimization: Align counters to 64 bytes (typical cache line)
    // to prevent False Sharing between the dispenser and the display.
    next_ticket: atomic.Value(u64) align(64) = atomic.Value(u64).init(0),
    current_ticket: atomic.Value(u64) align(64) = atomic.Value(u64).init(0),

    /// Acquire: Take a ticket and wait
    pub fn acquire(self: *TicketLock) void {
        // 1. Atomic Fetch-and-Add
        // This single atomic write establishes the global order.
        const my_ticket = self.next_ticket.fetchAdd(1, .Monotonic);

        // 2. Spin Wait (Read-Only)
        // Spin until the display shows our number.
        while (self.current_ticket.load(.Acquire) != my_ticket) {
            // Hint to the CPU that we are in a spin loop to save power/pipeline slots
            std.atomic.spinLoopPause();
        }
    }

    /// Release: Call the next number
    pub fn release(self: *TicketLock) void {
        // 3. Store-Release
        // Incrementing this value unblocks the specific thread holding (my_ticket + 1).
        const current = self.current_ticket.load(.Unordered);
        self.current_ticket.store(current + 1, .Release);
    }
};

3. The Cost of False Sharing

Notice the align(64) directives. These aren't just for show.

In modern multi-core processors, the L1/L2 cache operates on "cache lines" (usually 64 bytes). If next_ticket (written by arriving threads) and current_ticket (written by the lock holder) reside on the same cache line, cores will constantly invalidate each other's caches via the MESI protocol, even though they are touching different variables. This phenomenon is called False Sharing, and it can decimate performance.

By forcing alignment, we ensure these hot variables live in separate cache lines, minimizing unnecessary bus traffic.

4. The Trade-off: Global Spinning

While Ticket Locks solve fairness, they are not a silver bullet. They still suffer from global spinning:

All waiting threads spin on the same variable (current_ticket).
When current_ticket is updated, every waiting core invalidates its cache line and re-reads from memory.
On a machine with 100+ cores, this "thundering herd" of cache coherence traffic can saturate the interconnect.

For extreme scale, MCS locks (where each thread spins on its own local variable) are the superior choice, albeit at the cost of complexity.

Conclusion

The Ticket Lock is a beautiful example of how algorithmic fairness can reduce tail latency. It teaches us that:

Fairness is an architectural choice: We accept slightly higher overhead for stable, predictable performance.
Hardware empathy is non-negotiable: Simple alignment tweaks can prevent invisible performance cliffs.
Order matters: In a chaotic system, a simple queue is often the most robust solution.