Industrial-Grade Black Box Logging: Cross-Process Design with Shared Memory
When building high-frequency trading systems, autonomous driving control loops, or real-time search backends, traditional logging solutions often face a dilemma: detailed logging slows down the critical path, while asynchronous logging risks losing the last few milliseconds of crucial data—the "crime scene"—during a process crash.
This article explores a "black box" logging design pattern: using Memory-Mapped Files (mmap) to build a cross-process circular buffer. This design ensures ultra-low write latency while preserving the full context after an abnormal process exit, much like a flight data recorder.
We will use modern C++ (C++20) to demonstrate this core mechanism.
Why mmap?
Standard file I/O (std::ofstream, fwrite) or even asynchronous logging libraries (like spdlog's async mode) typically involve copying data from user space to kernel space or relying on memory queues. If the process encounters a SIGSEGV or SIGKILL, data in the memory queue evaporates instantly.
The advantages of mmap are:
- Zero-Copy Write: Writing is essentially a memory
memcpy, with the OS handling dirty page writeback. - Crash Survival: As long as the OS kernel does not hang, data written to the mapped region persists in the filesystem even if the process crashes.
- Cross-Process Observability: Since it maps a file, external monitoring processes can mount the same file and "peek" at the running state in real-time, just like reading memory, without interfering with the main process.
Core Architecture: Ring Buffer based on mmap
We need to design a fixed-length Header for synchronizing read/write positions, plus a fixed-length circular data area (Ring Buffer).
1. Memory Layout Design
// layout.h
#include <cstdint>
#include <atomic>
struct LogHeader {
static constexpr uint64_t MAGIC = 0xBADB0X01;
uint64_t magic;
uint32_t version;
std::atomic<uint32_t> write_offset; // Atomic for multi-process safety
std::atomic<uint32_t> wrap_count; // Lap count to distinguish new/old data
uint64_t capacity;
};
// Data follows immediately after the Header
2. Writer Implementation (C++20)
Using std::span and std::filesystem simplifies resource management.
// blackbox_logger.h
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <filesystem>
#include <iostream>
#include <span>
#include <cstring>
#include "layout.h"
namespace fs = std::filesystem;
class BlackBoxLogger {
public:
BlackBoxLogger(const fs::path& path, size_t size_mb) {
size_t total_size = sizeof(LogHeader) + (size_mb * 1024 * 1024);
int fd = open(path.c_str(), O_RDWR | O_CREAT, 0644);
if (fd == -1) throw std::runtime_error("Failed to open file");
// Pre-allocate file space
if (ftruncate(fd, total_size) == -1) {
close(fd);
throw std::runtime_error("Failed to resize file");
}
void* map_ptr = mmap(nullptr, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
close(fd); // fd can be closed after mmap is established
if (map_ptr == MAP_FAILED) throw std::runtime_error("mmap failed");
_base_ptr = static_cast<uint8_t*>(map_ptr);
_header = reinterpret_cast<LogHeader*>(_base_ptr);
_payload = std::span<uint8_t>(_base_ptr + sizeof(LogHeader), total_size - sizeof(LogHeader));
// Initialize Header (if it's a new file)
if (_header->magic != LogHeader::MAGIC) {
_header->magic = LogHeader::MAGIC;
_header->capacity = _payload.size();
_header->write_offset = 0;
_header->wrap_count = 0;
}
}
~BlackBoxLogger() {
// Not strictly necessary to munmap on destructor as OS cleans up, but good practice:
munmap(_base_ptr, sizeof(LogHeader) + _payload.size());
}
void log(std::string_view msg) {
// Simple length-prefixed protocol: [len:2][msg:n]
uint16_t len = static_cast<uint16_t>(msg.size());
if (len == 0) return;
write_raw(&len, sizeof(len));
write_raw(msg.data(), len);
}
private:
void write_raw(const void* data, size_t size) {
const uint8_t* src = static_cast<const uint8_t*>(data);
size_t remaining = size;
while (remaining > 0) {
uint32_t current_offset = _header->write_offset.load(std::memory_order_acquire);
size_t space_at_end = _payload.size() - current_offset;
size_t chunk = std::min(remaining, space_at_end);
std::memcpy(_payload.data() + current_offset, src, chunk);
// Update pointers and state
src += chunk;
remaining -= chunk;
uint32_t next_offset = current_offset + chunk;
if (next_offset == _payload.size()) {
next_offset = 0;
_header->wrap_count.fetch_add(1, std::memory_order_relaxed);
}
// Commit new offset, making it visible to readers
_header->write_offset.store(next_offset, std::memory_order_release);
}
}
uint8_t* _base_ptr;
LogHeader* _header;
std::span<uint8_t> _payload;
};
Key Design Trade-offs
Atomicity vs. Consistency
The code above uses std::atomic to update write_offset. This is an "eventually consistent" design.
- Pros: Extremely fast. No Mutexes, no system calls.
- Cons: Unsafe in Multi-Writer scenarios. This design is typically used for Single-Producer, Single-Consumer (Monitor) patterns. If multi-threaded writing is needed, locks are still required at the application layer, but the lock granularity is limited to
memcpyand does not involve I/O blocking.
Why not msync?
Many believe msync must be called to ensure data hits the disk. However, msync(MS_SYNC) on Linux causes blocking, defeating the purpose of low latency. We rely on the OS's dirty page writeback strategy (usually every few seconds) or the integrity of kernel data structures during a process crash. As long as power is not lost and the OS kernel is alive, data remains in the Page Cache and file contents will be up-to-date after a restart.
Summary
This "black box" logging pattern is ideal for recording system heartbeats, critical state changes, and final moments before a crash. It transforms the logging system from a "text stream" into a "memory database," enabling cross-process real-time monitoring and post-mortem debugging with virtually no performance penalty.