Industrial-Grade Black Box Logging: Cross-Process Design with Shared Memory

When building high-frequency trading systems, autonomous driving control loops, or real-time search backends, traditional logging solutions often face a dilemma: detailed logging slows down the critical path, while asynchronous logging risks losing the last few milliseconds of crucial data—the "crime scene"—during a process crash.

This article explores a "black box" logging design pattern: using Memory-Mapped Files (mmap) to build a cross-process circular buffer. This design ensures ultra-low write latency while preserving the full context after an abnormal process exit, much like a flight data recorder.

We will use modern C++ (C++20) to demonstrate this core mechanism.

Why mmap?

Standard file I/O (std::ofstream, fwrite) or even asynchronous logging libraries (like spdlog's async mode) typically involve copying data from user space to kernel space or relying on memory queues. If the process encounters a SIGSEGV or SIGKILL, data in the memory queue evaporates instantly.

The advantages of mmap are:

Zero-Copy Write: Writing is essentially a memory memcpy, with the OS handling dirty page writeback.
Crash Survival: As long as the OS kernel does not hang, data written to the mapped region persists in the filesystem even if the process crashes.
Cross-Process Observability: Since it maps a file, external monitoring processes can mount the same file and "peek" at the running state in real-time, just like reading memory, without interfering with the main process.

Core Architecture: Ring Buffer based on mmap

We need to design a fixed-length Header for synchronizing read/write positions, plus a fixed-length circular data area (Ring Buffer).

1. Memory Layout Design

// layout.h
#include <cstdint>
#include <atomic>

struct LogHeader {
    static constexpr uint64_t MAGIC = 0xBADB0X01;
    uint64_t magic;
    uint32_t version;
    std::atomic<uint32_t> write_offset; // Atomic for multi-process safety
    std::atomic<uint32_t> wrap_count;   // Lap count to distinguish new/old data
    uint64_t capacity;
};

// Data follows immediately after the Header

2. Writer Implementation (C++20)

Using std::span and std::filesystem simplifies resource management.

// blackbox_logger.h
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <filesystem>
#include <iostream>
#include <span>
#include <cstring>
#include "layout.h"

namespace fs = std::filesystem;

class BlackBoxLogger {
public:
    BlackBoxLogger(const fs::path& path, size_t size_mb) {
        size_t total_size = sizeof(LogHeader) + (size_mb * 1024 * 1024);
        
        int fd = open(path.c_str(), O_RDWR | O_CREAT, 0644);
        if (fd == -1) throw std::runtime_error("Failed to open file");
        
        // Pre-allocate file space
        if (ftruncate(fd, total_size) == -1) {
            close(fd);
            throw std::runtime_error("Failed to resize file");
        }

        void* map_ptr = mmap(nullptr, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        close(fd); // fd can be closed after mmap is established

        if (map_ptr == MAP_FAILED) throw std::runtime_error("mmap failed");

        _base_ptr = static_cast<uint8_t*>(map_ptr);
        _header = reinterpret_cast<LogHeader*>(_base_ptr);
        _payload = std::span<uint8_t>(_base_ptr + sizeof(LogHeader), total_size - sizeof(LogHeader));

        // Initialize Header (if it's a new file)
        if (_header->magic != LogHeader::MAGIC) {
            _header->magic = LogHeader::MAGIC;
            _header->capacity = _payload.size();
            _header->write_offset = 0;
            _header->wrap_count = 0;
        }
    }

    ~BlackBoxLogger() {
        // Not strictly necessary to munmap on destructor as OS cleans up, but good practice:
        munmap(_base_ptr, sizeof(LogHeader) + _payload.size());
    }

    void log(std::string_view msg) {
        // Simple length-prefixed protocol: [len:2][msg:n]
        uint16_t len = static_cast<uint16_t>(msg.size());
        if (len == 0) return;

        write_raw(&len, sizeof(len));
        write_raw(msg.data(), len);
    }

private:
    void write_raw(const void* data, size_t size) {
        const uint8_t* src = static_cast<const uint8_t*>(data);
        size_t remaining = size;
        
        while (remaining > 0) {
            uint32_t current_offset = _header->write_offset.load(std::memory_order_acquire);
            size_t space_at_end = _payload.size() - current_offset;
            size_t chunk = std::min(remaining, space_at_end);

            std::memcpy(_payload.data() + current_offset, src, chunk);
            
            // Update pointers and state
            src += chunk;
            remaining -= chunk;
            
            uint32_t next_offset = current_offset + chunk;
            if (next_offset == _payload.size()) {
                next_offset = 0;
                _header->wrap_count.fetch_add(1, std::memory_order_relaxed);
            }
            
            // Commit new offset, making it visible to readers
            _header->write_offset.store(next_offset, std::memory_order_release);
        }
    }

    uint8_t* _base_ptr;
    LogHeader* _header;
    std::span<uint8_t> _payload;
};

Key Design Trade-offs

Atomicity vs. Consistency

The code above uses std::atomic to update write_offset. This is an "eventually consistent" design.

Pros: Extremely fast. No Mutexes, no system calls.
Cons: Unsafe in Multi-Writer scenarios. This design is typically used for Single-Producer, Single-Consumer (Monitor) patterns. If multi-threaded writing is needed, locks are still required at the application layer, but the lock granularity is limited to memcpy and does not involve I/O blocking.

Why not `msync`?

Many believe msync must be called to ensure data hits the disk. However, msync(MS_SYNC) on Linux causes blocking, defeating the purpose of low latency. We rely on the OS's dirty page writeback strategy (usually every few seconds) or the integrity of kernel data structures during a process crash. As long as power is not lost and the OS kernel is alive, data remains in the Page Cache and file contents will be up-to-date after a restart.

Summary

This "black box" logging pattern is ideal for recording system heartbeats, critical state changes, and final moments before a crash. It transforms the logging system from a "text stream" into a "memory database," enabling cross-process real-time monitoring and post-mortem debugging with virtually no performance penalty.