How Go 1.20's WithCancelCause and Go 1.21's WithTimeoutCause let you attach a reason to context cancellation, plus a gotcha with manual cancel and the stdlib pattern that covers every path.
I’ve spent way more hours than I’d like to admit debugging context canceled and
context deadline exceeded errors. These errors usually tell you that a context was
canceled, but not exactly why. In a typical client-server scenario, the reason could be any
of the following:
- The client disconnected
- A parent deadline expired
- The server started shutting down
- Some code somewhere called
cancel()explicitly
Go 1.20 and 1.21 added cause-tracking functions to the context package that fix this, but
there’s a subtlety with WithTimeoutCause that most examples skip.
What “context canceled” actually tells you
Here’s a function that processes an order by calling three services under a shared 5-second timeout:
func processOrder(ctx context.Context, orderID string) error {
ctx, cancel := context.WithTimeout(ctx, 5*time.Second) // (1)
defer cancel() // (2)
if err := checkInventory(ctx, orderID); err != nil {
return err // (3)
}
if err := chargePayment(ctx, orderID); err != nil {
return err
}
return shipOrder(ctx, orderID)
}
- (1) creates a derived context that automatically cancels after 5 seconds
- (2) cleans up the timer when the function returns, standard practice per the context package documentation
- (3) if anything goes wrong, including a context cancellation, the error is returned as-is
When a context gets canceled, the underlying reason is either context.Canceled or
context.DeadlineExceeded. Libraries wrap these in their own types (*url.Error for
net/http, gRPC status codes for grpc), but errors.Is still matches the sentinel.
So if checkInventory makes an HTTP call and the client disconnects while it’s in flight,
the error that bubbles all the way up is:
If the 5-second timeout fires while chargePayment is waiting on a slow payment gateway:
context deadline exceeded
Two sentinel errors. No reason, no origin, nothing. The caller of processOrder has no idea
what actually happened.
You’d think wrapping the error helps:
if err := checkInventory(ctx, orderID); err != nil {
return fmt.Errorf("checking inventory for %s: %w", orderID, err)
}
Now the log says:
checking inventory for ord-123: context canceled
Better. You know it happened during the inventory check. But you still don’t know why the context was canceled. Was it the 5-second timeout? A parent context’s deadline? The client hanging up? A graceful shutdown signal? The error doesn’t say.
Without the cause, you can’t tell whether to retry, alert, or ignore, and your logs don’t give on-call enough to triage.
When this happens in production, you end up scanning logs for other errors around the same timestamp, hoping something nearby gives you a clue. If the logs don’t help, you trace the context from where it was created, through every function that receives it, looking for cancel calls and timeouts. In a small service this takes a few minutes. In a larger codebase with middleware, interceptors, and nested timeouts, it can take a lot longer.
This has been a known pain point in the Go community for years. Bryan C. Mills noted this in issue #26356 back in 2018:
I’ve seen this sort of issue crop up several times now. I wonder if
context.Contextshould record a bit of caller information… Then we could add a debugging hook to interrogate why a particularcontext.Contextwas cancelled.
On proposal #51365, which eventually led to the cause APIs, bullgare described the production experience:
I had a case when on production I got random “context canceled” log messages. And in the case like that you don’t even know where to dig and how to investigate it further. Or how to reproduce it on a local machine.
That proposal led to the cause APIs that shipped in go 1.20.
Attaching a cause with WithCancelCause
context.WithCancelCause gives you a CancelCauseFunc that takes an error instead of a
plain CancelFunc. Here’s the same processOrder rewritten to use it:
func processOrder(ctx context.Context, orderID string) error {
ctx, cancel := context.WithCancelCause(ctx)
defer cancel(nil) // (1)
if err := checkInventory(ctx, orderID); err != nil {
cancel(fmt.Errorf(
"order %s: inventory check failed: %w", orderID, err,
)) // (2)
return err
}
if err := chargePayment(ctx, orderID); err != nil {
cancel(fmt.Errorf(
"order %s: payment failed: %w", orderID, err,
))
return err
}
return shipOrder(ctx, orderID)
}
- (1)
cancel(nil)as the default, sets the cause tocontext.Canceled - (2) before returning the error, records a specific reason that includes the original error
via
%w
Now you can read the cause with context.Cause(ctx). If checkInventory fails because of a
connection error, the cause comes back as:
order ord-123: inventory check failed: connection refused
Instead of just context canceled. You know it was the inventory check, you know it was a
connection error, and because the original error is wrapped with %w, the full error chain
is preserved for programmatic inspection.
The first call to cancel wins. Once a cause is recorded, subsequent calls are no-ops. So
defer cancel(nil) only takes effect if nothing else canceled the context first. This means
the most specific cancel, the one closest to the actual failure, is what gets recorded. If
checkInventory sets a cause and then defer cancel(nil) runs on the way out, the
inventory cause is preserved.
context.Cause is a standalone function rather than a method on Context because Go’s
compatibility promise means the Context interface can’t add new methods. Err() will
always return nil, Canceled, or DeadlineExceeded. If you call context.Cause on a
context that wasn’t created with one of the cause-aware functions, it returns whatever
ctx.Err() returns. On an uncanceled context, it returns nil.
This handles explicit cancellation, but the function still has no timeout. The original
version used WithTimeout for the 5-second deadline. To label that timeout with a cause, Go
1.21 added WithTimeoutCause:
ctx, cancel := context.WithTimeoutCause(
ctx,
5*time.Second,
fmt.Errorf("order %s: 5s processing timeout exceeded", orderID),
)
defer cancel()
When the timer fires, context.Cause(ctx) returns the custom error instead of a bare
context.DeadlineExceeded. There’s also WithDeadlineCause, which is the same thing but
takes an absolute time.Time. If all you need is a label on the timeout path,
WithTimeoutCause works. But there’s a subtlety in how it interacts with defer cancel()
that can silently discard your cause.
Why defer cancel() discards the cause
WithTimeoutCause returns (Context, CancelFunc), not (Context, CancelCauseFunc). The
cancel function you get back doesn’t accept an error argument. Proposal #56661 defined it
this way explicitly:
func WithTimeoutCause(
parent Context, timeout time.Duration, cause error,
) (Context, CancelFunc)
Think about what happens when processOrder finishes normally in 100ms, well before the
5-second timeout:
ctx, cancel := context.WithTimeoutCause(
ctx,
5*time.Second,
fmt.Errorf("order %s: 5s timeout exceeded", orderID),
)
defer cancel() // (1)
// ... returns in 100ms ...
- (1)
cancel()fires on return, before the timer
If the timer fires first (the function ran too long), the context is canceled with
DeadlineExceeded and context.Cause(ctx) returns your custom message. That path works
correctly.
But if the function returns first, which is the common case, defer cancel() fires. Since
it’s a plain CancelFunc, it can’t take a cause argument. The Go source shows what it does
internally:
return c, func() { c.cancel(true, Canceled, nil) }
It passes Canceled with a nil cause. Your custom cause only gets recorded when the
internal timer fires. On the normal return path, the cause is just context.Canceled.
This isn’t a bug. WithTimeoutCause is a new function, so it could have returned
CancelCauseFunc. The Go team chose not to. rsc explained the reasoning when closing
proposal #51365:
WithDeadlineCauseandWithTimeoutCauserequire you to say ahead of time what the cause will be when the timer goes off, and then that cause is used in place of the genericDeadlineExceeded. The cancel functions they return are plainCancelFuncs(with no user-specified cause), notCancelCauseFuncs, the reasoning being that the cancel on one of these is typically just for cleanup and/or to signal teardown that doesn’t look at the cause anyway.
He also acknowledged that this creates a subtle distinction between the two APIs:
That distinction makes sense, but it makes
WithDeadlineCauseandWithTimeoutCausedifferent in an important, subtle way fromWithCancelCause. We missed that in the discussion…
So WithTimeoutCause only carries the custom cause when the timeout actually fires. On the
normal return path and on any explicit cancellation path, defer cancel() discards it. If
you have a middleware that logs context.Cause(ctx) for every request, it’ll see
context.Canceled instead of something useful on the most common path.
Covering every path with a manual timer
The way around this is to skip WithTimeoutCause and wire the timer yourself using
WithCancelCause. Since there’s only one CancelCauseFunc, every path goes through the
same door, and first-cancel-wins handles the rest. Here’s processOrder one more time:
func processOrder(ctx context.Context, orderID string) error {
ctx, cancel := context.WithCancelCause(ctx) // (1)
defer cancel(errors.New("processOrder completed")) // (2)
timer := time.AfterFunc(5*time.Second, func() {
cancel(fmt.Errorf("order %s: 5s timeout exceeded", orderID)) // (3)
})
defer timer.Stop() // (4)
if err := checkInventory(ctx, orderID); err != nil {
cancel(fmt.Errorf(
"order %s: inventory check failed: %w", orderID, err,
))
return err
}
if err := chargePayment(ctx, orderID); err != nil {
cancel(fmt.Errorf("order %s: payment failed: %w", orderID, err))
return err
}
return shipOrder(ctx, orderID)
}
- (1) one
CancelCauseFuncfor everything - (2) the default cause if nothing else cancels first
- (3) the timer fires with a timeout-specific cause
- (4) stop the timer on normal return
Three possible paths, one cancel function. If the timer fires, context.Cause(ctx) returns:
order ord-123: 5s timeout exceeded
If checkInventory fails with a connection error:
order ord-123: inventory check failed: connection refused
On normal completion:
This is actually what the stdlib does internally; WithDeadline uses time.AfterFunc under
the hood.
The trade-off is that ctx.Err() always returns context.Canceled, never
context.DeadlineExceeded, because you’re using WithCancelCause instead of WithTimeout.
ctx.Deadline() also returns the zero value, which matters if downstream code or frameworks
use it to propagate deadlines (gRPC, for example, sends the deadline across service
boundaries via ctx.Deadline()). If downstream code branches on
errors.Is(err, context.DeadlineExceeded), that check won’t match either.
When you also need DeadlineExceeded
If downstream code relies on errors.Is(err, context.DeadlineExceeded) to distinguish
timeouts from explicit cancellations, stack a WithCancelCause on top of a
WithTimeoutCause:
func processOrder(ctx context.Context, orderID string) error {
ctx, cancelCause := context.WithCancelCause(ctx) // (1)
ctx, cancelTimeout := context.WithTimeoutCause( // (2)
ctx,
5*time.Second,
fmt.Errorf("order %s: 5s timeout exceeded", orderID),
)
defer cancelTimeout() // (3)
defer cancelCause(errors.New("processOrder completed")) // (4)
if err := checkInventory(ctx, orderID); err != nil {
cancelCause(fmt.Errorf(
"order %s: inventory check failed: %w", orderID, err,
))
return err
}
if err := chargePayment(ctx, orderID); err != nil {
cancelCause(fmt.Errorf(
"order %s: payment failed: %w", orderID, err,
))
return err
}
return shipOrder(ctx, orderID)
}
- (1) outer context for error-path and normal-completion causes
- (2) inner context with a timeout cause for the deadline path
- (3) deferred first, runs last (LIFO), cleans up the inner timeout context
- (4) deferred second, runs first (LIFO), cancels the outer context with a cause
When the timeout fires, the inner context gets canceled with DeadlineExceeded and the
custom cause. errors.Is(ctx.Err(), context.DeadlineExceeded) works as expected. On the
error path, cancelCause(specificErr) cancels the outer context, which propagates to the
inner. On normal completion, cancelCause("processOrder completed") runs first because of
LIFO defer ordering, canceling the outer and propagating to the inner. Then
cancelTimeout() finds the inner already canceled and does nothing.
Note
Notice the defer ordering. cancelCause must be deferred after cancelTimeout so it
runs before it (LIFO). If you reverse them, cancelTimeout() cancels the inner context
with context.Canceled before cancelCause gets a chance to set a meaningful cause.
One subtlety: after line (2), ctx points to the inner context. If you call
context.Cause(ctx) on it after a cancelCause(specificErr) call, you’ll see
context.Canceled (propagated from the outer), not the specific error. The specific cause
lives on the outer context. In practice this doesn’t matter because the caller inspects the
returned error, not context.Cause, but it’s worth knowing if you add logging inside
processOrder itself.
The manual timer pattern is simpler and covers most cases. This stacked approach is for when
downstream code specifically relies on errors.Is(err, context.DeadlineExceeded).
Reading and logging the cause
context.Cause returns an error, so the full errors.Is and errors.As machinery works
on it. Since the cause in processOrder wraps the original error with %w, you can unwrap
through it to reach the underlying error.
If checkInventory failed because the inventory service refused the connection, the cause
is "order ord-123: inventory check failed: connection refused", and the wrapped error is a
*net.OpError. You can pull it out:
cause := context.Cause(ctx)
var netErr *net.OpError
if errors.As(cause, &netErr) {
// The inventory service is unreachable.
slog.Error("network failure",
"op", netErr.Op,
"addr", netErr.Addr,
)
}
errors.Is works the same way. If the timer cause had wrapped context.DeadlineExceeded
(e.g., with fmt.Errorf("order timeout: %w", context.DeadlineExceeded)), you could check
for it:
if errors.Is(context.Cause(ctx), context.DeadlineExceeded) {
// A timeout fired; maybe adjust the deadline or retry.
}
For logging, ctx.Err() and context.Cause(ctx) serve different purposes. ctx.Err()
gives you the category (cancellation or timeout), and context.Cause(ctx) gives you the
specific reason. Keeping them as separate structured log fields makes them easy to query:
if ctx.Err() != nil {
slog.Error("request failed",
"err", ctx.Err(),
"cause", context.Cause(ctx),
)
}
That produces:
level=ERROR msg="request failed" err="context deadline exceeded"
cause="order ord-123: 5s timeout exceeded"
A useful pattern is wrapping the request context with WithCancelCause at the middleware
level so every handler downstream gets automatic cause tracking. The cancel function is
stashed in the context via WithValue so handlers can pull it out and set a specific cause:
type cancelCauseKey struct{}
func withCause(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithCancelCause(r.Context()) // (1)
defer cancel(errors.New("request completed")) // (2)
ctx = context.WithValue(ctx, cancelCauseKey{}, cancel) // (3)
next.ServeHTTP(w, r.WithContext(ctx))
if ctx.Err() != nil { // (4)
slog.Error("request context canceled",
"method", r.Method,
"path", r.URL.Path,
"err", ctx.Err(),
"cause", context.Cause(ctx),
)
}
})
}
- (1) wrap the request context with
WithCancelCause - (2) default cause for normal completion
- (3) stash the cancel function so downstream handlers can reach it
- (4) only fires if the context was canceled during request handling (client disconnect,
handler cancel), not on normal completion;
defer cancel(...)hasn’t run yet at this point
Any handler can pull the cancel function out and set a cause:
func handleOrder(w http.ResponseWriter, r *http.Request) {
cancel, _ := r.Context().Value(cancelCauseKey{}).(context.CancelCauseFunc)
if err := processOrder(r.Context()); err != nil {
cancel(fmt.Errorf("order processing failed: %w", err))
http.Error(w, "order failed", http.StatusInternalServerError)
return
}
// ...
}
First cancel wins, so the most specific reason is what shows up in the middleware log.
streamingfast/substreams uses this approach in production, storing a CancelCauseFunc in
the request context so worker pools downstream can cancel with a specific error.
One thing to know: the stdlib’s HTTP server and most third-party libraries cancel contexts
without setting a cause, since they predate Go 1.20. If a client disconnects,
context.Cause(ctx) will return context.Canceled, not a custom error. The cause APIs are
most useful for reasons set by your own code.
Closing words
Most of the time, WithCancelCause is all you need. It covers explicit cancellation with a
specific reason, and context.Cause gives you a way to read it back. If you also need a
timeout, WithTimeoutCause labels the deadline path without extra wiring. The gotcha is
that defer cancel() on the normal return path discards the cause, so if you need causes on
every path, including normal completion, the manual timer pattern fills that gap. The
stacked approach on top of that is for when downstream code also needs DeadlineExceeded.
The cause APIs have seen steady adoption since Go 1.20. golang.org/x/sync/errgroup uses
WithCancelCause internally since v0.3.0, so context.Cause(ctx) on an errgroup-canceled
context returns the actual goroutine error. docker cli uses it to distinguish OS signals
from normal cancellation. kubernetes cluster-api migrated its codebase to the *Cause
variants. gRPC-Go had a proposal to use it for distinguishing client disconnects from gRPC
timeouts and connection closures.
Runnable examples:
§