``` 构建 Slogbox ```

原文

Go 1.25 shipped runtime/trace.FlightRecorder, a circular buffer for execution traces. The concept is clean: keep recent data in memory, snapshot on demand, throw away what’s old. But runtime/trace captures goroutine scheduling and GC pauses. I wanted the same idea for structured logs.

So I built slogbox: a slog.Handler backed by a fixed-size ring buffer. You wire it up like any other handler, and it keeps the last N log records in memory for health checks, debug endpoints, or black box recording.

This post walks through every design decision. Not the “how to use it” guide (the README covers that) but the “why is it shaped this way” journal. Every choice was a trade-off, and I think the trade-offs are more interesting than the final code.

The ring buffer

The first question: how do you store the last N records efficiently?

The naive approach is append to a slice, then truncate when it gets too long. This works, but every truncation either copies elements or lets the backing array grow unbounded. In a handler that runs on every log call, that means allocations on the hot path and GC pressure you don’t need.

The solution is a pre-allocated slice with modulo arithmetic:

type recorder struct {
	mu    sync.RWMutex
	buf   []slog.Record
	head  int    // next write position
	count int    // records stored (max = len(buf))
	total uint64 // monotonic write counter

	flushOn   slog.Leveler
	flushTo   slog.Handler
	lastFlush uint64 // value of total claimed by the last flush

	maxAge time.Duration
}

buf is allocated once in New() with exactly the capacity you asked for. Writes go to buf[head], then head advances with wraparound:

c.buf[c.head] = nr
c.head = (c.head + 1) % len(c.buf)
if c.count < len(c.buf) {
    c.count++
}
c.total++

No slice growth, no append, no copy on write. The only allocation that matters is the slog.Record itself, which the caller already created.

Reading the buffer back in order requires handling the wraparound. If the buffer isn’t full, records sit in buf[0:count]. If it is full, the oldest record is at head (it’s about to be overwritten next) and we need to read from head to end, then from start to head:

func (c *recorder) snapshotAll() []slog.Record {
	out := make([]slog.Record, c.count)
	if c.count < len(c.buf) {
		copy(out, c.buf[:c.count])
	} else {
		n := copy(out, c.buf[c.head:])
		copy(out[n:], c.buf[:c.head])
	}
	return out
}

The snapshot allocates once (the output slice) and uses copy, which is about as fast as Go gets for moving contiguous memory. The benchmarks confirmed this works: Handle runs at ~150 ns/op with 1 alloc on the hot path (the alloc comes from resolving and merging attributes into the stored record, not from the ring buffer itself).

Storing records, not strings

What should the buffer actually hold? The obvious choice is to format each record at write time (to JSON, text, whatever) and store the string. Then reads are trivial: concatenate the strings.

The buffer stores raw slog.Record values instead. Callers choose the serialization format at read time.

The reasoning: writes happen on every log call. Reads happen when someone hits /debug/logs or when an error triggers a flush. This is a write-heavy, read-rarely system. Formatting on the write path does work that gets thrown away as records rotate out of the buffer. Worse, it bakes in a format decision. If you stored JSON strings but later want to filter by level or grep by message, you’d have to unmarshal what you just marshaled.

Storing raw records means the buffer holds heavier values (a slog.Record has time, level, message, PC, and attrs). But the flexibility matters: you only pay serialization cost when someone actually looks at the data. For health check endpoints that fire once every 30 seconds, this is the right trade-off.

The read-side API reflects this:

Records() []slog.Record              // raw snapshot, oldest to newest
RecordsAbove(slog.Level) []slog.Record // level-filtered snapshot
All()     iter.Seq[slog.Record]      // iterator over the same snapshot
JSON()    ([]byte, error)            // marshal as JSON array
WriteTo(w io.Writer) (int64, error)  // stream JSON to any io.Writer

JSON() and WriteTo() for the common case, Records() and All() when you want to do something custom. RecordsAbove() sits in between: it returns the raw snapshot filtered by minimum level, so you can ask for “all warnings and errors in the buffer” without writing the filter yourself. It chains with MaxAge too: age filtering first, then level filtering.

Resolving values at Handle time

Here’s a correctness issue that’s easy to miss. slog.LogValuer lets you attach dynamic values to log attributes. A LogValuer is resolved by calling its LogValue() method, and the result can change over time. Think of a struct that returns its current state.

If you store the raw attr and resolve it later (at serialization time), you capture the state at read time, not at log time. That’s a bug. The record says “this happened at 14:03:02” but the attribute value reflects what the struct looked like at 14:05:17 when someone hit the debug endpoint.

The fix is to resolve eagerly in Handle:

r.Attrs(func(a slog.Attr) bool {
    v := a.Value.Resolve()
    if a.Key == "" && v.Kind() != slog.KindGroup {
        return true // skip: empty-key non-group attr
    }
    recordAttrs = append(recordAttrs, slog.Attr{Key: a.Key, Value: v})
    return true
})

The a.Key == "" guard deserves a note. The slog contract says to ignore attrs whose key and value are both the zero value, testable via a.Equal(slog.Attr{}). But Value.Equal panics when both sides hold the same non-comparable type (two slices stored via slog.Any, for example). Checking a.Key == "" is broader: it catches all empty-key attrs, not just zero-value ones. In practice the standard library handlers apply the same broader filter, and it avoids a panic surface in production code.

This resolution isn’t limited to Handle. WithAttrs applies the same eager resolution via a resolveAttrs helper, so handler-level attrs passed through logger.With(...) are also captured at registration time, not at log time. Consistency matters: if record-level attrs resolve eagerly but handler-level attrs resolve lazily, you get different snapshot semantics depending on where the attr was attached.

The locking model

A slog.Handler gets called from any goroutine. You can have 32 goroutines logging simultaneously while a health check endpoint reads the buffer. The question: how to handle concurrent reads and writes without killing performance?

The key insight is asymmetry. Writes are the hot path. Every log call goes through Handle, potentially thousands of times per second. Reads are the cold path: a health check endpoint, a debug dump, maybe a flush on error. Optimizing for writes at the expense of reads is the correct call.

sync.RWMutex fits this perfectly. Writers take an exclusive lock, but they only hold it for the handful of instructions that update the ring buffer. Readers share a read lock, and they only hold it long enough to snapshot the buffer (a make and copy). The actual work of serializing, filtering, or streaming happens after the lock is released.

The asymmetry shows up clearly in the code. Handle takes Lock():

c.mu.Lock()
c.buf[c.head] = nr
c.head = (c.head + 1) % len(c.buf)
if c.count < len(c.buf) {
    c.count++
}
c.total++
// ... flush snapshot if needed
c.mu.Unlock()

Records takes RLock():

c.mu.RLock()
out := c.snapshotAll()
maxAge := c.maxAge
c.mu.RUnlock()

The natural Go instinct is to reach for a channel instead. Send records to a goroutine that owns the buffer, let it serialize access without explicit locks. The problem is latency. A channel-based design means every Handle call does a channel send, which involves goroutine scheduling: the sender blocks until the receiver dequeues, and the receiver goroutine needs to be scheduled by the runtime. With a mutex, the writer updates the buffer directly in its own goroutine. No scheduling, no goroutine handoff, no channel allocation per record. At ~150 ns/op, the mutex path is roughly what a single unbuffered channel send costs on its own, before you even touch the buffer.

There’s a deeper structural mismatch too. Channels serialize all access: readers and writers take turns through the same goroutine. With RWMutex, multiple readers snapshot the buffer concurrently while only writers are exclusive. For a write-heavy, read-rarely system, that concurrency matters. A channel would turn every health check read into a message that waits behind hundreds of pending writes.

Under contention (32 goroutines writing), Handle_Parallel benchmarks at ~440 ns/op. Not zero overhead, but the mutex is held for so few instructions that the contention window is tiny. And the readers never block each other.

Flush outside the lock

The black box recorder pattern works like this: accumulate log records silently, and when something bad happens (an ERROR-level log), flush everything that’s been buffered to a real handler (like slog.JSONHandler writing to stderr). The question: what happens during that flush?

The naive approach flushes inside the write lock. After all, you need the lock to snapshot the buffer. But if FlushTo writes to stderr, or to a network, or to anything that does I/O, you’re blocking every writer for the duration of that I/O. One slow flush and your entire application’s logging stalls.

The solution: snapshot under the lock, claim the flush window immediately, unlock, then flush:

// ... after storing the record in the ring buffer (see "The locking model") ...

var flushRecords []slog.Record
if c.flushOn != nil && c.flushTo != nil && nr.Level >= c.flushOn.Level() {
    n := min(c.total-c.lastFlush, uint64(c.count))
    flushRecords = c.snapshotLast(int(n))
    // Claim the window immediately under the lock so concurrent flushes
    // compute non-overlapping ranges.
    c.lastFlush = c.total
}
c.mu.Unlock()

for _, fr := range flushRecords {
    if err := c.flushTo.Handle(context.Background(), fr); err != nil {
        return err
    }
}
return nil

The total and lastFlush counters do the bookkeeping. total is a monotonic counter that increments on every write. lastFlush records how far we’ve flushed. The difference tells us exactly how many records have accumulated since the last flush.

The important design choice: lastFlush is claimed immediately under the lock, before any I/O happens. This gives at-most-once delivery semantics. If FlushTo.Handle returns an error partway through, the claimed records are never re-sent. A concurrent flush triggered by another goroutine will compute its own non-overlapping range starting from where this one claimed. Simpler than tracking partial progress, and the right semantic for a black box recorder where re-sending stale context is worse than losing it.

A note on context: flushed records are replayed with context.Background(). The original request context is not available because Handle doesn’t store it. This is intentional, for three reasons. First, the flush replays old records, not the current one. When an ERROR fires, it drains the last N records: INFOs and DEBUGs accumulated over time, each from a different request with a different context. The ERROR’s context has no meaningful relationship to those older records. Second, storing a context.Context per record would pin entire context chains in memory (parent contexts, cancel functions, request-scoped values) until the record rotates out of the buffer. For a 500-slot buffer with 5-minute MaxAge, that’s 500 live context trees the GC can’t collect. Third, stale deadlines would cause false failures. A record logged 30 seconds ago had a request context whose deadline has already passed. Replaying it with that original context would cause FlushTo.Handle to fail immediately on ctx.Err(), defeating the purpose of the flush.

The explicit Flush(ctx) method is different: the caller provides a context that governs the entire drain operation, not individual records. During graceful shutdown you pass context.WithTimeout(ctx, 5*time.Second), and that single deadline covers the whole batch. That’s the right granularity for “spend up to 5 seconds draining before the process exits”.

One hazard worth calling out: FlushTo must not log back to the same slogbox handler, directly or indirectly. If it does, the flushed record triggers another flush, which flushes records that trigger more flushes. The cycle exhausts the stack or deadlocks depending on timing.

The benchmark tells the story: FlushTrigger costs ~32,900 ns/op, but that’s flushing ~100 records through a handler. The non-triggering path (Handle_WithFlush) runs at ~144 ns/op: just a nil check and a level comparison inside the lock.

Explicit flush for graceful shutdown

Level-triggered flush handles the “something went wrong” case. But what about clean shutdown? When a process exits, records accumulated since the last level-triggered flush are silently lost. If your service received 500 INFO-level requests after the last ERROR, those 500 records disappear.

Flush(ctx) drains pending records explicitly:

func (h *Handler) Flush(ctx context.Context) error {
	c := h.core
	if c.flushTo == nil {
		return nil
	}

	c.mu.Lock()
	n := min(c.total-c.lastFlush, uint64(c.count))
	if n == 0 {
		c.mu.Unlock()
		return nil
	}
	flushRecords := c.snapshotLast(int(n))
	c.lastFlush = c.total
	c.mu.Unlock()

	for _, fr := range flushRecords {
		if err := ctx.Err(); err != nil {
			return err
		}
		if err := c.flushTo.Handle(ctx, fr); err != nil {
			return err
		}
	}
	return nil
}

Three design differences from the level-triggered path. First, Flush only requires FlushTo to be set, not FlushOn. This enables a manual-only pattern: set FlushTo without FlushOn, and records are never flushed automatically but can be drained on demand during shutdown or health checks. Second, Flush accepts a caller-provided context instead of using context.Background(). During graceful shutdown you typically have a deadline (context.WithTimeout), and the flush should respect it. Third, ctx.Err() is checked between records, so a cancelled context stops delivery early rather than blocking on a dead FlushTo.

The at-most-once claim semantics are identical: the window is claimed under the lock before any I/O begins. If Flush fails partway through, those records are gone.

Age filtering with binary search

The MaxAge option excludes records older than a duration from read operations. The question: how do you filter old records without scanning the entire buffer?

The insight is that a ring buffer snapshot, once linearized by snapshotAll, is in chronological order. Oldest record first, newest last. This means we can binary search for the cutoff point:

func filterByAge(records []slog.Record, maxAge time.Duration, now time.Time) []slog.Record {
	cutoff := now.Add(-maxAge)
	i, _ := slices.BinarySearchFunc(records, cutoff, func(r slog.Record, t time.Time) int {
		return r.Time.Compare(t)
	})
	return records[i:]
}

O(log n) with zero allocation. We’re not creating a new slice; we’re returning a sub-slice of the snapshot that snapshotAll already allocated. slices.BinarySearchFunc is the modern standard library alternative to sort.Search, and the comparison function maps cleanly to time.Time.Compare.

A design note: Len() returns the physical count of records in the buffer, ignoring MaxAge. This is intentional. Len answers “how full is my buffer?” which is a capacity question. Records answers “what’s relevant right now?” which is a read question. Mixing the two would make Len time-dependent, which feels wrong for something that should be a simple count.

Shared buffer, independent attrs

slog.Handler has two methods that create derived handlers: WithAttrs adds default attributes, and WithGroup nests attributes under a group name. The question: when you call logger.With("service", "api"), should the new handler get its own copy of the ring buffer?

The naive approach copies the entire buffer per derived handler. But that defeats the purpose. You want all your loggers writing to the same buffer so the health check endpoint sees everything.

The solution: all handlers derived from the same New() call share a single *recorder. Only the attributes and groups are per-handler:

func (h *Handler) clone() *Handler {
	return &Handler{
		core:       h.core,
		level:      h.level,
		attrs:      slices.Clone(h.attrs),
		groups:     slices.Clone(h.groups),
		groupsUsed: h.groupsUsed,
	}
}

core is a pointer to the shared recorder. attrs and groups are cloned because each derived handler accumulates its own. When Handle runs, it merges the handler-level attrs with the record-level attrs into the stored record via mergeGroupAttrs. This means the buffer always contains fully-resolved records, so readers don’t need to know which handler variant wrote each entry.

The mergeGroupAttrs function handles the tricky part: recursively navigating nested group structures to merge attributes at the correct nesting level. If you have logger.WithGroup("request").With("method", "GET"), the method attr needs to end up under the request group in the stored record. It’s one of the more involved pieces of the code, but it’s only invoked on the write path when handler-level attrs exist, which is typically set up once at initialization.

This is relevant because logger.With("service", "api") shares the buffer. logger.WithGroup("request") shares the buffer.

You can create as many derived loggers as you want without fragmenting your debug view.

io.WriterTo and HTTP composability

The library implements io.WriterTo, which makes it directly composable with http.ResponseWriter:

func (h *Handler) WriteTo(w io.Writer) (int64, error) {
	entries := recordsToEntries(h.Records())
	data, err := json.Marshal(entries)
	if err != nil {
		return 0, err
	}
	n, err := w.Write(data)
	return int64(n), err
}

This enables a /debug/logs endpoint in a single line:

http.Handle("GET /debug/logs", slogbox.HTTPHandler(rec, nil))

HTTPHandler sets Content-Type, calls WriteTo, and handles errors. When WriteTo fails, the error path checks how many bytes were written before the failure. If zero, the http.ResponseWriter hasn’t flushed headers yet, so the handler can still reply with 500. If bytes already went out, headers are committed and the status code can’t be changed. For that case, pass a callback as the second argument to log the error or take other action. Pass nil for the default behavior (500 when possible, silent drop when not).

The default output is JSON because that’s what most HTTP consumers expect. If you need a different format (logfmt, plain text, custom filtering), use Records() to get the raw slice and format however you want.

There’s a tension between flexibility and usefulness here. A library that only gives you raw records and says “format it yourself” is more flexible but less useful. A library that only gives you JSON is less flexible but immediately useful. Shipping both felt like the right balance.

Streaming with json/v2

The default WriteTo marshals the entire JSON array into a []byte before writing. For 100 records that’s fine. For 10,000 records, that intermediate allocation hurts.

When built with GOEXPERIMENT=jsonv2, WriteTo switches to a streaming implementation using jsontext.Encoder. Records are written one at a time, so the peak allocation drops from “entire JSON output” to “one record at a time.” The API is identical; the optimization is transparent via build tags.

The improvement is dramatic: for 10K records, allocations drop from ~190K to 35 and throughput improves roughly 27x (see the README benchmarks for exact numbers). The build tag exists rather than a runtime switch because the encoding/json/v2 package doesn’t exist without the experiment flag. Once jsonv2 stabilizes in a future Go release, the default path can switch over and the build tag goes away.

Observability without overhead

Once the buffer exists, you want to ask questions about it. “How full is my buffer?” “How many records have passed through?” “How many records are waiting to be flushed?” These are monitoring questions, and they should be cheap.

Four methods expose buffer state:

Len() int              // records physically in buffer
Capacity() int         // buffer size passed to New
TotalRecords() uint64  // monotonic write counter, reset only by Clear
PendingFlushCount() int // records not yet flushed (0 if FlushTo not set)

Len and Capacity are the obvious pair: how full is my buffer versus how big it is. TotalRecords is more interesting. It’s the total counter from the recorder, exposed read-only. It survives wrap-around because it’s a monotonic uint64, not a position in the ring. You can use it to compute throughput: sample TotalRecords at two points, divide by elapsed time, and you have records per second without touching the buffer contents.

PendingFlushCount answers a different question: how much context would be lost if the process crashed right now? It’s total - lastFlush, capped at count. If your monitoring shows this climbing steadily without any flushes, either your flush threshold is too high or your service isn’t hitting errors (which might be good news, depending on your perspective).

Len, TotalRecords, and PendingFlushCount take a read lock and return immediately. Capacity doesn’t even need a lock: the backing slice is allocated once in New and never resized, so len(buf) is immutable. No snapshots, no copies, no allocations.

Clearing the buffer

Clear removes all records and resets the flush state:

func (h *Handler) Clear() {
	c := h.core
	c.mu.Lock()
	defer c.mu.Unlock()
	clear(c.buf)
	c.head = 0
	c.count = 0
	c.total = 0
	c.lastFlush = 0
}

The implementation zeroes the backing slice (via the builtin clear), resets all counters, and sets lastFlush to zero so the next flush starts from a clean slate.

One concurrency subtlety: Clear does not wait for in-flight flushes. If a concurrent Handle has already claimed its flush window (snapshotted records and advanced lastFlush under the lock), those records will still be delivered to FlushTo after Clear returns. New records written after Clear form a fresh window. The race is benign because each flush operates on its own snapshot, but it means Clear is not a hard stop for pending I/O. In practice this only matters if you call Clear concurrently with error-triggered flushes, which is uncommon. For the typical use case of resetting between test runs or draining a debug endpoint, Clear does exactly what you’d expect.

When to use this

Health check endpoints. Your /healthz or /debug/logs endpoint returns the last 500 log records as JSON. No log infrastructure needed, no file tailing, just an HTTP call.

Black box recorder. Accumulate DEBUG and INFO logs silently. When an ERROR fires, flush everything to stderr or your real logging pipeline. You get the context around the error without paying for persistent DEBUG logging.

Dev and debugging. You’re iterating locally and don’t want to set up a log aggregator. Point your browser at localhost:8080/debug/logs and see what happened.

Lightweight services. Sidecars, CLI tools, small APIs that don’t justify a full ELK/LGTM stack. Keep recent logs in memory, expose them when needed.

Multi-handler composition. Pair it with a slog multi-handler. Your production handler writes to stdout for the aggregator; slogbox keeps a recent window for the health endpoint. Same logger, two destinations.

When to avoid this

You need persistent logs. The ring buffer is in-memory only. If the process dies, the buffer dies with it. For audit trails or compliance, use a real log pipeline.

You need cross-instance aggregation. Each instance has its own buffer. If you need to correlate logs across 50 pods, you need centralized logging.

You need millions of records. The buffer is a pre-allocated slice. A buffer of 1,000,000 slog.Record values will use significant memory.

For large-scale retention, use a database or log store.

You need guaranteed delivery. The ring buffer silently overwrites old records when it’s full. If you can’t afford to lose a single log entry, this is the wrong tool.

Closing

Go 1.25’s runtime/trace.FlightRecorder applies the flight recorder concept at the runtime layer: goroutine scheduling, GC pauses, blocking events. slogbox applies the same concept at the application layer: structured log records.

Different stack layers, same idea. Keep recent data in a ring buffer, snapshot on demand, don’t pay for persistence you don’t need.

That’s still the point. Not every problem needs a framework. Sometimes a ring buffer, an RWMutex, and a handful of methods is exactly enough.

github.com/alexrios/slogbox

``` 构建 Slogbox ``` Building Slogbox