The Art of Asynchronous DNS Caching
In high-performance networking, DNS resolution is often the silent killer of throughput. A naive implementation using blocking calls like getaddrinfo can turn a sub-millisecond request into a multi-second ordeal if the upstream resolver lags. To build a robust load balancer, you don't just need a cache; you need an asynchronous state machine.
The Thundering Herd Problem
When a popular domain's TTL expires, your system faces a challenge. Thousands of concurrent requests will simultaneously notice the cache miss. Without proper orchestration, they will all trigger independent DNS lookups, leading to a "thundering herd" that overwhelms your resolver and wastes system resources.
The Four States of Resolution
A mature DNS cache entry transitions through four critical states to manage this complexity:
- Idle/Not Resolved: The baseline state where the first requester takes ownership of the resolution task.
- In Progress: A resolution task is active. Subsequent requesters don't start new tasks; instead, they "park" themselves and wait for a signal.
- Resolved: The data is ready and served immediately.
- Failed: Errors are cached to prevent constant retries against a failing upstream.
Implementation Insight: Double-Check Logic
The secret to a clean implementation lies in the "Double-Check" pattern. When a waiting request is woken up by the completion signal, it should re-evaluate the entry's state from the top. This ensures that no matter how many race conditions occur during the wait, the final result delivered to the application is always consistent.
Performance Trade-off: Stale-While-Revalidate
For mission-critical systems, waiting is not an option. Many modern caches implement a "Stale-While-Revalidate" approach. When data is technically expired but still in memory, the system serves the old data to the user immediately while spawning a background task to refresh the cache. This eliminates DNS latency from the critical path entirely.
Closing Thoughts
Caching is easy; managing cache concurrency is hard. By wrapping DNS logic in a well-defined state machine, we decouple the application's performance from the unpredictability of the network.