展示 HN:Git bayesect – 贝叶斯 Git 二分查找,用于非确定性错误
Show HN: Git bayesect – Bayesian Git bisection for non-deterministic bugs

原始链接: https://github.com/hauntsaninja/git_bayesect

## git-bayesect: 贝叶斯Git二分查找 `git-bayesect` 是一个用于识别引入事件发生概率变化的提交的工具(例如,不稳定的测试)。与传统的 `git bisect` 不同,它使用贝叶斯推理,只需要一个变化*已经*发生,而不需要精确的失败率。 它通过迭代地缩小可能的提交范围,基于最小化预期熵来选择下一个要测试的提交。该工具巧妙地使用 Beta-Bernoulli 共轭性来处理未知的失败概率。 **关键命令:** * `git bayesect start --old `:开始二分查找。 * `git bayesect pass --commit `:记录一次成功的测试。 * `git bayesect prior --commit --weight `:设置关于某个提交的先验信念。 * `git bayesect run `:使用给定的命令自动化测试。 你还可以基于文件名或提交消息/diff内容设置先验概率,以获得更准确的结果。提供了一个演示仓库和脚本来帮助你入门。

## Git Bayesect:用于非确定性错误的贝叶斯二分法 一个新的工具“git bayesect”解决了使用Git调试非确定性问题的难题。传统的`git bisect`需要一致的错误复现,这对于不稳定的或概率性错误来说是不可能的。 Git Bayesect使用贝叶斯方法来估计一个提交导致错误的概率,即使结果不一致。用户提供通过/失败的观察结果,该工具会智能地选择下一个要测试的提交,以达到最大的效率。 讨论强调了该工具在嘈杂基准测试中性能回归的潜力,以及直接输入原始基准测试分数的可行性。开发者确认目前支持对单个提交的重复观察,有效地将试验次数纳入贝叶斯更新,并且添加批量观察入口点将是直接的。底层的数学原理在开发者的网站上有详细说明。
相关文章

原文

Bayesian git bisection!

Use this to detect changes in likelihoods of events, for instance, to isolate a commit where a slightly flaky test became very flaky.

You don't need to know the likelihoods (although you can provide priors), just that something has changed at some point in some direction

Or:

uv tool install git_bayesect

git_bayesect uses Bayesian inference to identify the commit introducing a change, with commit selection performed via greedy minimisation of expected entropy, and using a Beta-Bernoulli conjugacy trick while calculating posterior probabilities to make handling unknown failure rates tractable.

See https://hauntsaninja.github.io/git_bayesect.html for a write up.

Start a Bayesian bisection:

git bayesect start --old $COMMIT

Record an observation on the current commit:

Or on a specific commit:

git bayesect pass --commit $COMMIT

Check the overall status of the bisection:

Reset:

Set the prior for a given commit:

git bayesect prior --commit $COMMIT --weight 10

Set prior for all commits based on filenames:

git bayesect priors_from_filenames --filenames-callback "return 10 if any('suspicious' in f for f in filenames) else 1"

Set prior for all commits based on the text in the commit message + diff:

git bayesect priors_from_text --text-callback "return 10 if 'timeout' in text.lower() else 1"

Get a log of commands to let you reconstruct the state:

Undo the last observation:

Run the bisection automatically using a command to make observations:

Checkout the best commmit to test:

This repository contains a little demo, in case you'd like to play around:

# Create a fake repository with a history to bayesect over
python scripts/generate_fake_repo.py
cd fake_repo

# The fake repo contains a script called flaky.py
# This is a simple script that fails some fraction of the time
# At some point in the history of the repo, that fraction was changed
python flaky.py
git log --oneline

# Start the bayesection
OLD_COMMIT=$(git rev-list HEAD --reverse | head -n 2 | tail -n 1)
git bayesect start --new main --old $OLD_COMMIT

# Run a bayesection to find the commit that introduced the change
git bayesect run python flaky.py
联系我们 contact @ memedata.com