开源项目面临日益严重的问题:由LLM生成的问题。
LLM policy?

原始链接: https://github.com/opencontainers/runc/issues/4990

该项目正在经历由LLM生成的拉取请求和错误报告数量增加,由此引发了对明确接受准则的需求。一个主要问题是验证LLM生成问题的有效性——由于描述通常包含过多且可能不准确的信息,建议将其视为垃圾信息,并要求提供原始提示。 关于代码贡献,建议提交者必须通过用自己的话回复审查请求来证明他们对所做更改的真正理解。这解决了关于作者身份和满足开发者证书来源(DCO)要求的担忧,尽管后者存在分歧。 最终,团队需要决定一项政策并在`CONTRIBUTING.md`中记录下来,可能效仿Incus的例子,直接禁止LLM生成的贡献。目标是维护代码质量并确保真实的问题报告。

## LLM 生成内容充斥开源项目 开源社区日益担忧的是大量低质量的 LLM 生成内容涌入,尤其是在 GitHub 的 issue 和 pull request 中。开发者报告自动创建的错误报告激增,其中一些源于公司强制实施的 AI 集成,这些报告往往不准确且耗时验证。 核心问题在于这种“AI 垃圾”浪费了维护者的时间,可能会用误报误导他们,并需要大量精力进行验证。有人建议采取严格的“禁止 AI”政策,自动关闭疑似 AI 生成的提交,除非证明其真实性。 关于透明度存在争议:即使无法检测,开发者是否应该披露在贡献中使用 AI 辅助?有人认为,将 AI 生成的作品冒充原创是不尊重人的,将审查的负担强加给志愿者。另一些人担心政策会完全阻止贡献。这个问题已经变得足够重要,以至于会影响关键基础设施项目,需要社区进行全盘讨论,以应对这一新的挑战。
相关文章

原文

We've seen a slight uptick in pull-requests and bug reports which appear to be LLM-generated, so it's probably about time to come to a decision on what we should and should not accept and document this somewhere (presumably in CONTRIBUTING.md).

My personal opinion is we shouldn't accept anything LLM-generated, but this is probably not the common position of most @opencontainers/runc-maintainers, so we should probably consider LLM-generated code and issues separately.

IMHO, we should close all LLM-generated issues as spam, because even if they are describing real issues the entire issue description contains so much unneeded (and probably incorrect) information that it'd be better if they just provided their LLM prompt as an issue instead. More importantly, when triaging bugs we have to assume that what the user has written did actually happen to them, but with LLM-generated issues -- who knows whether the description is actually describing something real? (See #4982 and #4972 as possible examples of LLM-generated bug reports.)

For LLM-generated code, I think the minimum bar should be that the submitter needs to be able to respond to review requests in their own words (i.e., they understand what their patch does and was able to write the code themselves). (#4940 and #4939 was the most recent example of this I can think of, and I'm not convinced the submitter would've cleared this bar.)

(FWIW, my view is that LLM-generated code cannot fulfil the requirements of the DCO and so we shouldn't accept it for the legal reasons alone, but I appreciate this is a minority view.)

For reference, Incus added add a note to their CONTRIBUTING.md earlier this year, banning all LLM usage.

联系我们 contact @ memedata.com