Go语言中百万种死于数据竞争的方式
A million ways to die from a data race in Go

原始链接: https://gaultier.github.io/blog/a_million_ways_to_data_race_in_go.html

## Go 数据竞争:深入剖析 本文深入探讨了 Go 语言中令人惊讶的常见问题——数据竞争,尽管该语言以并发性著称。当 Go 代码违反 Go 内存模型时,就会发生数据竞争,可能导致从静默失败到任意内存损坏等各种问题——尤其是在处理像 map 和 slice 这样复杂的数据结构时。 作者详细介绍了在生产代码中遇到的几个真实案例。一个常见的陷阱是在 goroutine 中意外捕获变量,导致对共享变量的并发修改。另一个涉及对看似线程安全类型(如 `http.Client`)内的字段进行并发修改,强调了仔细考虑可变性的必要性。一个特别微妙的竞争涉及一个保护全局 map 的互斥锁,但互斥锁的生命周期与数据不一致,导致同步无效。 关键要点是,Go 语言并发的简易性并不能保证安全性。作者建议使用竞态检测器进行严格测试,深度克隆数据以避免共享可变状态,并尽量减少对闭包的依赖。他们还建议进行潜在的语言改进,例如为闭包提供显式的捕获列表以及编译器生成的 `Clone()` 函数,以帮助防止这些问题。最终,警惕性和对 Go 内存模型的深入理解对于编写健壮的并发应用程序至关重要。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Go 中因数据竞争导致的百万种死法 (gaultier.github.io) 8 分,由 ingve 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系方式 搜索:
相关文章

原文

⏴ Back to all articles

Published on 2025-11-21

Table of contents

I have been writing production applications in Go for a few years now. I like some aspects of Go. One aspect I do not like is how easy it is to create data races in Go.

Go is often touted for its ease to write highly concurrent programs. However, it is also mind-boggling how many ways Go happily gives us developers to shoot ourselves in the foot.

Over the years I have encountered and fixed many interesting kinds of data races in Go. If that interests you, I have written about Go concurrency in the past and about some existing footguns, without them being necessarily 'Go data races':

So what is a 'Go data race'? Quite simply, it is Go code that does not conform to the Go memory model. Importantly, Go defines in its memory model what a Go compiler MUST do and MAY do when faced with a non-conforming program exhibiting data races. Not everything is allowed, quite the contrary in fact. Data races in Go are not benign either: their effects can range from 'no symptoms' to 'arbitrary memory corruption'.

Quoting the Go memory model:

This means that races on multiword data structures can lead to inconsistent values not corresponding to a single write. When the values depend on the consistency of internal (pointer, length) or (pointer, type) pairs, as can be the case for interface values, maps, slices, and strings in most Go implementations, such races can in turn lead to arbitrary memory corruption.

With this out of the way, let's take a tour of real data races in Go code that I have encountered and fixed. At the end I will emit some recommendations to (try to) avoid them.

I also recommend reading the paper A Study of Real-World Data Races in Golang. This article humbly hopes to be a spiritual companion to it. Some items here are also present in this paper, and some are new.

In the code I will often use errgroup.WaitGroup or sync.WaitGroup because they act as a fork-join pattern, shortening the code. The exact same can be done with 'raw' Go channels and goroutines. This also serves to show that using higher-level concepts does not magically protect against all data races.

This one is very common in Go and also very easy to fall into. Here is a simplified reproducer:

package main

import (
	"context"

	"golang.org/x/sync/errgroup"
)

func Foo() error { return nil }
func Bar() error { return nil }
func Baz() error { return nil }

func Run(ctx context.Context) error {
	err := Foo()
	if err != nil {
		return err
	}

	wg, ctx := errgroup.WithContext(ctx)
	wg.Go(func() error {
		err = Baz()
		if err != nil {
			return err
		}

		return nil
	})
	wg.Go(func() error {
		err = Bar()
		if err != nil {
			return err
		}

		return nil
	})

	return wg.Wait()
}

func main() {
	println(Run(context.Background()))
}

The issue might not be immediately visible.

The problem is that the err outer variable is implicitly captured by the closures running each in a separate goroutine. They then mutate err concurrently. What they meant to do is instead use a variable local to the closure and return that instead. There is conceptually no need to share any data; this is purely accidental.

The fix

Learnings

previous article where this silent behavior bit me, we can use the build flag -gcflags='-d closure=1' to make the Go compiler print which variables are being captured by the closure:

$ go build -gcflags='-d closure=1' 
./main.go:20:8: heap closure, captured vars = [err]
./main.go:28:8: heap closure, captured vars = [err]

But this is not realistic to do that in a big codebase and inspect each closure. It's useful if you know that a given closure might suffer from this problem.

The Go docs state about http.Client:

[...] Clients should be reused instead of created as needed. Clients are safe for concurrent use by multiple goroutines.

So imagine my surprise when the Go race detector flagged a race tied to http.Client. The code looked like this:

package main

import (
	"context"
	"net/http"

	"golang.org/x/sync/errgroup"
)

func Run(ctx context.Context) error {
	client := http.Client{}

	wg, ctx := errgroup.WithContext(ctx)
	wg.Go(func() error {
		client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
			if req.Host == "google.com" {
				return nil
			} else {
				return http.ErrUseLastResponse
			}
		}
		_, err := client.Get("http://google.com")
		return err
	})
	wg.Go(func() error {
		client.CheckRedirect = nil
		_, err := client.Get("http://amazon.com")
		return err
	})

	return wg.Wait()
}

func main() {
	println(Run(context.Background()))
}

The program makes two concurrent HTTP requests to two different URLs. For the first one, the code restricts redirects (I invented the exact logic for that, no need to look too much into it, the real code has complex logic here). For the second one, no redirect checks are performed, by setting CheckRedirect to nil. This code is idiomatic and follows the recommendations from the documentation:

CheckRedirect specifies the policy for handling redirects. If CheckRedirect is not nil, the client calls it before following an HTTP redirect. If CheckRedirect is nil, the Client uses its default policy [...].

The problem is: the CheckRedirect field is modified concurrently without any synchronization which is a data race.

This code also suffers from an I/O race: depending on the network speed and response time for both URLs, the redirects might or might be checked, since the callback might get overwritten from the other goroutine, right when the HTTP client would call it.

Alternatively, the http.Client could end up calling a nil callback if the callback was set when the http.Client checked whether it was nil or not, but before http.Client had the chance to call it, the other goroutine set it to nil. Boom, nil dereference.

The fix

Learnings

The fix

Learnings

The fix

Learnings

⏴ Back to all articles

联系我们 contact @ memedata.com