补丁应用来自提交消息的虚假差异。
Patch applies fake diffs from commit messages

原始链接: https://samizdat.dev/phantom-patch/

本报告详细描述了在使用从GitHub等来源下载的补丁时可能出现的意外行为。GitHub将补丁以`.patch`文件的形式提供,当使用GNU `patch`处理时,该工具似乎会应用预期的更改,以及提交消息本身中包含的任何类似diff格式的文本。 作者通过一个公共仓库演示了这一点,该仓库的提交修改了`readme.md`,但其消息中还包含一个不存在文件(`SHOULD_NOT_BE_HERE.md`)的伪diff。下载补丁并使用`patch -p1`应用,会导致*两个*文件被创建/修改——预期的`readme.md`和意外的`SHOULD_NOT_BE_HERE.md`。 虽然`git apply`和`git am`对`.git/`目录内的更改有一定的抵抗力,但它们仍然接受针对工作树文件的注入diff。作者正在调查问题是否出在GNU `patch`、GitHub的补丁导出,或补丁格式本身,强调了常见补丁工作流中潜在的漏洞。

最近的 Hacker News 讨论强调了 `patch` 工具及其处理提交消息的一个常见问题。该工具有时会错误地应用从提交消息*内部*信息派生的“虚假差异”。 用户指出,核心问题在于 Unix 传统中,每个工具都使用自己、文档记录不完善的文件格式——不同于更标准化的选项,如 XML 或 JSON。虽然 XML/JSON 会提供清晰度,但会牺牲当前补丁格式的人类可读性。 `git` 本身能够导出和重新导入带有这些问题提交消息差异的补丁,这进一步加剧了问题。改进建议包括 GitHub 返回格式类似于 `git show` 输出(带有缩进的提交消息)的补丁文件,并承认在没有适当转义的情况下使用带内信号的固有风险。最终,这场讨论强调了维护与数十年前的工具和格式兼容的挑战。
相关文章

原文

GitHub (and many others) exposes mail-style patches at .patch URLs. If you download one of those patches and feed it to GNU patch, diff-shaped text inside the commit message can be applied as if it were part of the real patch.

It matters (to me) because wget/curl plus patch is not some exotic lab setup. It is a very old, very ordinary way to move a patch from one machine to another.

Public reproducer

From dd28283159930b8fff2119aa9f75af8b4c1ed8b2 Mon Sep 17 00:00:00 2001
From: Egor Kovetskiy <e.kovetskiy [spam] gmail.com>
Date: Wed, 22 Apr 2026 06:37:11 +0000
Subject: [PATCH] readme: add initial file

The body includes a fake diff for patch workflow testing.

diff --git a/SHOULD_NOT_BE_HERE.md b/SHOULD_NOT_BE_HERE.md
new file mode 100644
index 0000000..802992c
--- /dev/null
+++ b/SHOULD_NOT_BE_HERE.md
@@ -0,0 +1 @@
+Hello world
---
 readme.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 readme.md

diff --git a/readme.md b/readme.md
new file mode 100644
index 0000000..b44b8fd
--- /dev/null
+++ b/readme.md
@@ -0,0 +1 @@
+Demo repository

Here is the smallest public demo I could make:

The real commit changes one file: readme.md.

If you inspect the commit in GitHub’s UI, that is all you see.

But the commit message also contains a fake unified diff:

diff --git a/SHOULD_NOT_BE_HERE.md b/SHOULD_NOT_BE_HERE.md
new file mode 100644
index 0000000..802992c
--- /dev/null
+++ b/SHOULD_NOT_BE_HERE.md
@@ -0,0 +1 @@
+Hello world

So the exported patch has two layers:

  1. The real patch changes readme.md.
  2. The phantom patch lives inside the commit message and creates SHOULD_NOT_BE_HERE.md.

Scenario:

wget -O /tmp/dd28283.patch \
  https://github.com/kovetskiy/git-example/commit/dd28283.patch
patch -p1 < /tmp/dd28283.patch

output:

patching file SHOULD_NOT_BE_HERE.md
patching file readme.md

That SHOULD_NOT_BE_HERE.md was never part of the real commit.

Is something broken

I am not sure. But from my POV, GNU patch -p1 does not reliably separate two things:

  • the actual diff exported from the commit
  • diff-shaped text embedded in the commit message

Scope

The public demo writes an ordinary file because that is easy to publish and easy to inspect.

Locally, I also targeted .git/hooks/post-applypatch, and GNU patch happily accepted that (why would not it, right?).

Fortunately, git apply and git am behaved better in one narrow sense: they rejected the .git/... path. But they still accepted an injected diff for an ordinary working-tree file.

NOTE: git cherry-pick looks different. It works with Git objects directly.

Takeaway

I do not yet know whether the bug belongs to GNU patch, GitHub’s .patch export, or the broader patch-format contract. But I’ll look at the commit message closer next time.

联系我们 contact @ memedata.com