伺服系统中的生成式 AI

伺服系统中的生成式 AI
Generative AI in Servo

原始链接: https://www.azabani.com/2025/04/11/generative-ai-in-servo.html

Servo 的一位核心贡献者强烈反对最近 TSC 放松对 GitHub Copilot 等生成式 AI 工具禁令的投票结果。他们认为这些工具损害了项目的质量、社区信任和道德地位，并着重指出了生成代码错误、缺乏可解释性、潜在法律问题以及疏远贡献者的风险。虽然承认特定且定义明确的工具（例如仅用于协作的语音识别或机器翻译）可能有例外，但作者强调需要社区驱动的评估和严格的限制以降低风险。他们批评之前的提案定义不明确，缺乏关于可接受的自动完成和有问题的代码生成之间的区别。作者还谈到了 Servo 治理方面的问题，承认关于 AI 贡献的决定是在没有充分的公开讨论或社区协商的情况下做出的。他们主张建立更清晰、更有条理的决策流程，并建立正式的 RFC 流程以征求社区意见。最终，他们呼吁明确重申 Servo 对生成式 AI 的禁令，以重建信任并维护项目的完整性。

Hacker News上的一篇帖子讨论了一篇博客文章（“Servo中的生成式AI”），该文章概述了在Servo项目中使用AI的担忧。评论者反应不一。一位用户StopDisinfo910认为，暗示不信任贡献者负责任地使用AI工具是对他们的侮辱，并表示他们对Servo失去了好感。nasso_dev反驳说，这篇文章代表的是个人观点，并不一定代表整个Servo组织的观点。porphyra质疑如何可靠地检测AI生成的代码，因为熟练的程序员可以在AI的帮助下编写出无错误的代码。martin-t表示欣慰的是，有人将代码质量优先于速度，并列举了AI工具在实际编码中不足的例子，强调需要更加关注质量。讨论围绕着AI的潜力与人工监督和代码质量在软件开发中的重要性之间的平衡展开。

（评论） 2025-02-23

（评论） 2025-04-05

（评论） 2024-01-30

GitHub Copilot 的新研究发现“代码质量面临下行压力” 2024-01-30

原文

Servo has shown that we can build a browser with a modern, parallel layout engine in a fraction of the cost of the big incumbents, thanks to our powerful tooling, our strong community, and our thorough documentation. But we can, and should, build Servo without generative AI tools like GitHub Copilot.

I’m the lead author of our monthly updates and the Servo book, a member of the Technical Steering Committee, and a coauthor of our current AI policy (permalink). That policy was inspired by Gentoo’s AI policy, and has in turn inspired the AI policies of Loupe and Amaranth.

Recently the TSC voted in favour of two proposals that relax our ban on AI contributions. This was a mistake, and it was also a mistake to wait until after we had made our decision to seek community feedback (see § On governance). § Your feedback made it clear that those proposals are the wrong way forward for Servo.

Within minutes of announcing them, someone pointed out that in the one example of an AI-assisted contribution we based them on, there appears to be a trivial logic error between the spec text and the code. If this is representative of the project’s AI-assisted future, then Servo is not fit for purpose.

I call on the TSC to explicitly reaffirm that generative AI tools like Copilot are not welcome in Servo, and make it clear that we intend to keep it that way indefinitely, in both our policy and the community, so we can start rebuilding trust. It’s not enough to say oops, sorry, we will not be moving forward with these proposals.

Like any logic written by humans, this policy does have some unintended consequences. Our intent was to ban AI tools that generate bullshit [a] in inscrutable ways, including GitHub Copilot and ChatGPT. But there are other tools that use similar underlying technology in more useful and less problematic ways (see § Potential exceptions). Reviewing these tools for use in Servo should be a community-driven process.

We should not punish contributors for honest mistakes, but we should make our policy easier to follow. Some ways to do this include documenting the tools that are known to be allowed and not allowed, documenting how to turn off features that are not allowed, and giving contributors a way to declare that they’ve read and followed the policy.

The declaration would be a good place to provide a dated link to the policy, giving contributors the best chance to understand the policy and knowingly follow it (or violate it). This is not perfect, and it won’t always be easy to enforce, but it should give contributors and maintainers a foundation of trust.

Potential exceptions

Proposals for exceptions should start in the community, and should focus on a specific tool used for a specific purpose. If the proposal is for a specific kind of tool, it must come with concrete examples of which tools are to be allowed. Much of the harm being caused by generative AI in the world around us comes from people using open-ended tools that are not fit for any purpose, or even treating them like they are AGI.

The goal of these discussions would be to understand:

the underlying challenges faced by contributors
how effective the tool is for the purpose
how well the tool and purpose mitigate the issues in the policy
whether there are any existing or alternative solutions
whether those solutions have problems that need to be addressed

Sometimes the purpose may need to be constrained to mitigate the issues in the policy. Let’s look at a couple of examples.

For some tasks like speech recognition [b] and machine translation [c] [d], tools with large language models and transformers are the state of the art (other than humans). This means those tools may be probabilistic tools, and strictly speaking, they may be generative AI tools, because the models they use are generative models. Generative AI does not necessarily mean “AI that generates bullshit in inscrutable ways”.

Speech recognition can be used in a variety of ways. If plumbed into ChatGPT, it will have all of the same problems as ChatGPT. If used for automatic captions, it can make videos and calls accessible to people that can’t hear well (myself included), but it can also infantilise us by censoring profanities and make serious errors that cause real harm. If deployed for that purpose by an online video platform, it can undermine the labour of human transcribers and lower the overall quality of captions.

If used as an input method, it would be a clear win for accessibility. My understanding of speech input tools is that they have a clear (if configurable) mapping from the things you say to the text they generate or the edits they make, so they may be a good fit.

In that case, maintainer burden and correctness and security would not be an issue, because the author is in complete control of what they write. Copyright issues seem less of a concern to me, since these tools operate on such a small scale (words and symbols) that they are unlikely to reproduce a copyrightable amount of text verbatim, but I am not a lawyer. As for ethical issues, these tools are generally trained once then run on the author’s device. When used as an input method, they are not being used to undermine labour or justify layoffs. I’m not sure about the process of training their models.

Machine translation can also be used in a variety of ways. If deployed by a language learning app, it can ruin the quality of your core product, but hey, then you can lay off those pesky human translators. If used to localise your product, your users will finally be able to compress to postcode file. If used to localise your docs, it can make your docs worse than useless unless you take very careful precautions. What if we allowed contributors to use machine translation to communicate with each other, but not in code commits, documentation, or any other work products?

Deployed carelessly, they will waste the reader’s time, and undermine the labour of actual human translators who would otherwise be happy to contribute to Servo. If constrained to collaboration, it would still be far from perfect, but it may be worthwhile.

Maintainer burden should be mitigated, because this won’t change the amount or kind of text that needs to be reviewed. Correctness and security too, because this won’t change the text that can be committed to Servo. I can’t comment on the copyright issues, because I am not a lawyer. The ethical issues may be significantly reduced, because this use case wasn’t a market for human translators in the first place.

Your feedback

I appreciate the feedback you gave on the Fediverse, on Bluesky, and on Reddit. I also appreciate the comments on GitHub from several people who were more on the favouring side of the proposal, even though we reached different conclusions in most cases. One comment argued that it’s possible to use AI autocomplete safely by accepting the completions one word at a time.

That said, the overall consensus in our community was overwhelmingly clear, including among many of those who were in favour of the proposals. None of the benefits of generative AI tools are worth the cost in community goodwill [e].

Much of the dissent on GitHub was already covered by our existing policy, but there were quite a few arguments worth highlighting.

Speech-to-text input is ok [f] [g].

Machine translation is generally not useful or effective for technical writing [h] [i] [j]. It can be, if some precautions are taken [k]. It may be less ethically encumbered than generative AI tools [l]. Client-side machine translation is ok [m]. Machine translation for collaboration is ok [n] [o].

The proposals. Proposal 1 is ill-defined [p]. Proposal 2 has an ill-defined distinction between autocompletes and “full” code generation [q] [r] [s].

Documentation is just as technical as code [u]. Wrong documentation is worse than no documentation [v] [w] [x]. Good documentation requires human context [y] [z].

GitHub Copilot is not a good tool for answering questions [ab]. It isn’t even that good of a programming tool [ac]. Using it may be incompatible with the DCO [ad]. Using it could make us depend on Microsoft to protect us against legal liability [ae].

Correctness. Generative AI code is wrong at an alarming rate [af]. Generative AI tools will lie to us with complete confidence [ag]. Generative AI tools (and users of those tools) cannot explain their reasoning [ah] [ai]. Humans as supervisors are ill-equipped to deal with the subtle errors that generative AI tools make [aj] [ak] [al] [am]. Even experts can easily be misled by these tools [an]. Typing is not the hard part of programming [ao], as even some of those in favour have said:

If I could offload that part of the work to copilot, I would be left with more energy for the challenging part.

Project health. Partially lifting the ban will create uncertainty that increases maintainer burden for all contributions [ap] [aq]. Becoming dependent on tools with non-free models is risky [ar]. Generative AI tools may not be fair use [as] → [at]. Outside of Servo, people have spent so much time cleaning up after LLM-generated mess [au].

Material. Servo contributor refuses to spend time cleaning up after LLM-generated mess [av]. Others will stop donating [aw] [ax] [ay] [az] [ba] [bb] [bc] [bd] [be] [bf] [bg], will stop contributing [bh], will not start donating [bi], will not start contributing [bj] [bk], or will not start promoting [bl] the project.

Broader context. Allowing AI contributions is a bad signal for the project’s relationship with the broader AI movement [bm] [bn] [bo]. The modern AI movement is backed by overwhelming capital interests, and must be opposed equally strongly [bp]. People often “need” GitHub or Firefox, but no one “needs” Servo, so we can and should be held to a higher standard [bq]. Rejection of AI is only credible if the project rejects AI contributions [br]. We can attract funding from AI-adjacent parties without getting into AI ourselves [bs], though that may be easier said than done [bt].

On governance

Several people have raised concerns about how Servo’s governance could have led to this decision, and some have even suspected foul play. But like most discussions in the TSC, most of the discussion around AI contributions happened async on Zulip, and we didn’t save anything special for the synchronous monthly public calls. As a result, whenever the discussion overflowed the sync meeting, we just continued it internally, so the public minutes were missing the vast majority of the discussion (and the decisions). These decisions should probably have happened in public.

Our decisions followed the TSC’s usual process, with a strong preference for resolving disagreements by consensus rather than by voting, but we didn’t have any consistent structure for moving from one to the other. This may have made the decision process prone to being blocked and dominated by the most persistent participants.

Contrast this with decision making within Igalia, where we also prefer consensus before voting, but the consensus process is always used to inform proposals that are drafted by more than one person and then always voted on. Most polls are “yes” or “no” by majority, and only a few polls for the most critical matters allow vetoing. This ensures that proposals have meaningful support before being considered, and if only one person is strongly against something, they are heard but they generally can’t single-handedly block the decision with debate.

We also didn’t have any consistent structure for community consultation, so it wasn’t clear how or when we should seek feedback. A public RFC process may have helped with this, and would also help us collaborate on and document other decisions.

More personally, I did not participate in the extensive discussion in January and February that helped move consensus in the TSC towards allowing the non-code and Copilot exceptions until fairly late. Some of that was because I was on leave, including for the vote on the initial Copilot “experiments”, but most of it was that I didn’t have the bandwidth. Doing politics is hard, exhausting work, and there’s only so much of it you can do, even when you’re not wearing three other hats.