我们不需要更多非程序员贡献代码。
LLVM AI tool policy: human in the loop

原始链接: https://discourse.llvm.org/t/rfc-llvm-ai-tool-policy-human-in-the-loop/89159

## LLVM AI 贡献政策概要 拟议的 LLVM AI 贡献政策以 **“人工参与循环”** 为中心。贡献者欢迎*使用* AI 工具,但对他们的贡献**完全负责**,并且必须在提交之前彻底审查和理解所有 AI 生成的内容。核心原则是防止“提取性贡献”——即需要比其提供价值更多的维护者审查工作量的提交。 本质上,贡献者不应提交他们无法解释或捍卫的工作;“LLM 做的”不能被接受。透明度是关键——贡献者应清楚地表明 AI 工具的使用情况(例如,在提交消息中)。 该政策禁止在没有人工监督的情况下运行的自动化代理。虽然允许 LLM 辅助工作流程(例如生成文档*然后*手动审查),但不允许自动提交或审查工具。 维护者将标记潜在的提取性贡献,要求提供理由。重复不合规可能会导致对话锁定。该政策还强化了现有的版权责任——贡献者即使使用 AI 生成的内容,仍要对版权合规负责。该政策旨在平衡利用 AI 提高生产力的收益,同时保护维护者的时间并促进可持续、友好的社区。

最近一篇Hacker News上的帖子讨论了llvm.org的新政策,即他们不需要非程序员的代码贡献。讨论强调了对大量低质量代码提交的日益增长的沮丧,这可能得益于像ChatGPT这样的大型语言模型的使用增加。 评论者对需要明确说明这一点表示失望,指出持续的审查工作量以及呈指数级增长的提交数量。一个主要问题是贡献者提交他们不理解的代码,经常依赖大型语言模型而不验证结果——导致诸如“是LLM生成的”这样的回应。 虽然一些公司成功地利用人工智能驱动的代码审查工具,但llvm.org的政策反映了维护代码质量和避免自动化、未经验证的建议的愿望。总体情绪表明,在人工智能代码生成唾手可得的时代,平衡开源贡献与维护项目完整性是一个挑战。
相关文章

原文

Hey folks, I got a lot of feedback from various meetings on the proposed LLVM AI contribution policy, and I made some significant changes based on that feedback. The current draft proposal focuses on the idea of requiring a human in the loop who understands their contribution well enough to answer questions about it during review. The idea here is that contributors are not allowed to offload the work of validating LLM tool output to maintainers. I’ve mostly removed the Fedora policy in an effort to move from the vague notion of “owning the contribution” to a more explicit “contributors have to review their contributions and be prepared to answer questions about them”. Contributors should never find themselves in the position of saying “I don’t know, an LLM did it”. I felt the change here was significant, and deserved a new thread.

From an informal show of hands at the round table at the US LLVM developer meeting, most contributors (or at least the subset with the resources and interest in attending this round table in person) are interested in using LLM assistance to increase their productivity, and I really do want to enable them to do so, while also making sure we give maintainers a useful policy tool for pushing back against unwanted contributions.

I’ve updated the PR, and I’ve pasted the markdown below as well, but you can also view it on GitHub.


Policy

LLVM’s policy is that contributors can use whatever tools they would like to
craft their contributions, but there must be a human in the loop.
Contributors must read and review all LLM-generated code or text before they
ask other project members to review it.
The contributor is always the author
and is fully accountable for their contributions. Contributors should be
sufficiently confident that the contribution is high enough quality that asking
for a review is a good use of scarce maintainer time, and they should be able
to answer questions about their work
during review.

We expect that new contributors will be less confident in their contributions,
and our guidance to them is to start with small contributions that they can
fully understand to build confidence. We aspire to be a welcoming community
that helps new contributors grow their expertise, but learning involves taking
small steps, getting feedback, and iterating. Passing maintainer feedback to an
LLM doesn’t help anyone grow, and does not sustain our community.

Contributors are expected to be transparent and label contributions that
contain substantial amounts of tool-generated content
. Our policy on
labelling is intended to facilitate reviews, and not to track which parts of
LLVM are generated. Contributors should note tool usage in their pull request
description, commit message, or wherever authorship is normally indicated for
the work. For instance, use a commit message trailer like Assisted-by: . This transparency helps the community develop best practices
and understand the role of these new tools.

An important implication of this policy is that it bans agents that take action
in our digital spaces without human approval, such as the GitHub
@claude
agent
. Similarly, automated review tools that
publish comments without human review are not allowed. However, an opt-in
review tool that keeps a human in the loop is acceptable under this policy.
As another example, using an LLM to generate documentation, which a contributor
manually reviews for correctness, edits, and then posts as a PR, is an approved
use of tools under this policy.

This policy includes, but is not limited to, the following kinds of
contributions:

  • Code, usually in the form of a pull request
  • RFCs or design proposals
  • Issues or security vulnerabilities
  • Comments and feedback on pull requests

Extractive Contributions

The reason for our “human-in-the-loop” contribution policy is that processing
patches, PRs, RFCs, and comments to LLVM is not free – it takes a lot of
maintainer time and energy to review those contributions! Sending the
unreviewed output of an LLM to open source project maintainers extracts work
from them in the form of design and code review, so we call this kind of
contribution an “extractive contribution”.

Our golden rule is that a contribution should be worth more to the project
than the time it takes to review it. These ideas are captured by this quote
from the book
Working in Public by Nadia Eghbal:

"When attention is being appropriated, producers need to weigh the costs and
benefits of the transaction. To assess whether the appropriation of attention
is net-positive, it’s useful to distinguish between extractive and
non-extractive contributions. Extractive contributions are those where the
marginal cost of reviewing and merging that contribution is greater than the
marginal benefit to the project’s producers. In the case of a code
contribution, it might be a pull request that’s too complex or unwieldy to
review, given the potential upside." -- Nadia Eghbal

Prior to the advent of LLMs, open source project maintainers would often review
any and all changes sent to the project simply because posting a change for
review was a sign of interest from a potential long-term contributor. While new
tools enable more development, it shifts effort from the implementor to the
reviewer, and our policy exists to ensure that we value and do not squander
maintainer time.

Reviewing changes from new contributors is part of growing the next generation
of contributors and sustaining the project. We want the LLVM project to be
welcoming and open to aspiring compiler engineers who are willing to invest
time and effort to learn and grow, because growing our contributor base and
recruiting new maintainers helps sustain the project over the long term. Being
open to contributions and liberally granting commit access
is a big part of how LLVM has grown and successfully been adopted all across
the industry. We therefore automatically post a greeting comment to pull
requests from new contributors and encourage maintainers to spend their time to
help new contributors learn.

Handling Violations

If a maintainer judges that a contribution is extractive (i.e. it doesn’t
comply with this policy), they should copy-paste the following response to
request changes, add the extractive label if applicable, and refrain from
further engagement:

This PR appears to be extractive, and requires additional justification for
why it is valuable enough to the project for us to review it. Please see
our developer policy on AI-generated contributions:
http://llvm.org/docs/AIToolPolicy.html

Other reviewers should use the label to prioritize their review time.

The best ways to make a change less extractive and more valuable are to reduce
its size or complexity or to increase its usefulness to the community. These
factors are impossible to weigh objectively, and our project policy leaves this
determination up to the maintainers of the project, i.e. those who are doing
the work of sustaining the project.

If a contributor responds but doesn’t make their change meaningfully less
extractive, maintainers should escalate to the relevant moderation or admin
team for the space (GitHub, Discourse, Discord, etc) to lock the conversation.

Copyright

Artificial intelligence systems raise many questions around copyright that have
yet to be answered. Our policy on AI tools is similar to our copyright policy:
Contributors are responsible for ensuring that they have the right to
contribute code under the terms of our license, typically meaning that either
they, their employer, or their collaborators hold the copyright. Using AI tools
to regenerate copyrighted material does not remove the copyright, and
contributors are responsible for ensuring that such material does not appear in
their contributions. Contributions found to violate this policy will be removed
just like any other offending contribution.

Examples

Here are some examples of contributions that demonstrate how to apply
the principles of this policy:

  • This PR contains a proof from Alive2, which is a strong signal of
    value and correctness.
  • This generated documentation was reviewed for correctness by a
    human before being posted.

References

Our policy was informed by experiences in other communities:

联系我们 contact @ memedata.com