我阅读任何代码之前运行的 Git 命令

我阅读任何代码之前运行的 Git 命令
Git commands I run before reading any code

原始链接: https://piechowski.io/post/git-commands-before-reading-code/

在开始编写代码之前，使用五个关键的 Git 命令快速评估新代码库的健康状况。首先，识别**变更热点** (`git log --format=format: --name-only ... | sort | uniq -c | sort -nr | head -20`) – 经常修改的文件，通常表明复杂性或开发者避免的区域。接下来，确定**公交系数** (`git shortlog -sn --no-merges`)，通过识别关键贡献者；高度集中，特别是如果这些个人不再参与，则表示风险。通过分析提交消息中的错误相关关键词，找出**错误集群** (`git log -i -E --grep="fix|bug|broken" ...`)，然后与变更热点交叉引用，以确定高风险区域。使用**提交速度图** (`git log --format='%ad' ...`) 评估项目势头，寻找持续的活动或令人担忧的下降。最后，评估**紧急修复频率** (`git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback'`) – 频繁的回滚表明部署问题或不可靠的测试。这些命令提供快速诊断，在代码审查*之前*揭示潜在问题，从而节省时间并专注于最需要关注的地方。

一个 Hacker News 讨论强调了一篇博客文章，其中详细介绍了有用的 Git 命令，以便在深入研究代码库之前快速理解它。作者使用 `jj`，一个 Git 日志工具，来回答诸如：哪些文件更改最频繁、谁贡献最多、bug 集中在哪里、以及项目是积极开发还是衰退等问题。这些命令利用了 `jj` 过滤和分析提交历史的能力，识别经常修改的文件（通常是开发者避免的文件）、主要贡献者、容易出错的区域以及随时间的提交频率。评论者指出 `jj` 相对于标准 Git 的冗长性，将其与 Nix 包管理器在复杂性方面进行了比较。虽然功能强大，但有些人认为它是不必要的，更喜欢 Git 的熟悉性和普遍性，特别是当所呈现的分析对于他们的日常工作流程并不关键时。另一些人则认为这些启发式方法很有帮助，并强调了良好 Git 实践的重要性。

原文

Five git commands that tell you where a codebase hurts before you open a single file. Churn hotspots, bus factor, bug clusters, and crisis patterns.

Ally Piechowski · Apr 8, 2026 · 4 min read

The Git Commands I Run Before Reading Any Code

The first thing I usually do when I pick up a new codebase isn’t opening the code. It’s opening a terminal and running a handful of git commands. Before I look at a single file, the commit history gives me a diagnostic picture of the project: who built it, where the problems cluster, whether the team is shipping with confidence or tiptoeing around land mines.

What Changes the Most

git log --format=format: --name-only --since="1 year ago" | sort | uniq -c | sort -nr | head -20

The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

High churn on a file doesn’t mean it’s bad. Sometimes it’s just active development. But high churn on a file that nobody wants to own is the clearest signal of codebase drag I know. That’s the file where every change is a patch on a patch. The blast radius of a small edit is unpredictable. The team pads their estimates because they know it’s going to fight back.

A 2005 Microsoft Research study found churn-based metrics predicted defects more reliably than complexity metrics alone. I take the top 5 files from this list and cross-reference them against the bug hotspot command below. A file that’s high-churn and high-bug is your single biggest risk.

Who Built This

git shortlog -sn --no-merges

Every contributor ranked by commit count. If one person accounts for 60% or more, that’s your bus factor. If they left six months ago, it’s a crisis. If the top contributor from the overall shortlog doesn’t appear in a 6-month window (git shortlog -sn --no-merges --since="6 months ago"), I flag that to the client immediately.

I also look at the tail. Thirty contributors but only three active in the last year. The people who built this system aren’t the people maintaining it.

One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

Where Do Bugs Cluster

git log -i -E --grep="fix|bug|broken" --name-only --format='' | sort | uniq -c | sort -nr | head -20

Same shape as the churn command, filtered to commits with bug-related keywords. Compare this list against the churn hotspots. Files that appear on both are your highest-risk code: they keep breaking and keep getting patched, but never get properly fixed.

This depends on commit message discipline. If the team writes “update stuff” for every commit, you’ll get nothing. But even a rough map of bug density is better than no map.

Is This Project Accelerating or Dying

git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

Commit count by month, for the entire history of the repo. I scan the output looking for shapes. A steady rhythm is healthy. But what does it look like when the count drops by half in a single month? Usually someone left. A declining curve over 6 to 12 months tells you the team is losing momentum. Periodic spikes followed by quiet months means the team batches work into releases instead of shipping continuously.

I once showed a CTO their commit velocity chart and they said “that’s when we lost our second senior engineer.” They hadn’t connected the timeline before. This is team data, not code data.

How Often Is the Team Firefighting

git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback'

Revert and hotfix frequency. A handful over a year is normal. Reverts every couple of weeks means the team doesn’t trust its deploy process. They’re evidence of a deeper issue: unreliable tests, missing staging, or a deploy pipeline that makes rollbacks harder than they should be. Zero results is also a signal; either the team is stable, or nobody writes descriptive commit messages.

Crisis patterns are easy to read. Either they’re there or they’re not.

These five commands take a couple minutes to run. They won’t tell you everything. But you’ll know which code to read first, and what to look for when you get there. That’s the difference between spending your first day reading the codebase methodically and spending it wandering.

This is the first hour of what I do in a codebase audit. Here’s what the rest of the week looks like.