Servo has shown that we can build a browser with a modern, parallel layout engine in a fraction of the cost of the big incumbents, thanks to our powerful tooling, our strong community, and our thorough documentation. But we can, and should, build Servo without generative AI tools like GitHub Copilot.
I’m the lead author of our monthly updates and the Servo book, a member of the Technical Steering Committee, and a coauthor of our current AI policy (permalink). That policy was inspired by Gentoo’s AI policy, and has in turn inspired the AI policies of Loupe and Amaranth.
Recently the TSC voted in favour of two proposals that relax our ban on AI contributions. This was a mistake, and it was also a mistake to wait until after we had made our decision to seek community feedback (see § On governance). § Your feedback made it clear that those proposals are the wrong way forward for Servo.
Within minutes of announcing them, someone pointed out that in the one example of an AI-assisted contribution we based them on, there appears to be a trivial logic error between the spec text and the code. If this is representative of the project’s AI-assisted future, then Servo is not fit for purpose.
I call on the TSC to explicitly reaffirm that generative AI tools like Copilot are not welcome in Servo, and make it clear that we intend to keep it that way indefinitely, in both our policy and the community, so we can start rebuilding trust. It’s not enough to say oops, sorry, we will not be moving forward with these proposals.
Like any logic written by humans, this policy does have some unintended consequences. Our intent was to ban AI tools that generate bullshit [a] in inscrutable ways, including GitHub Copilot and ChatGPT. But there are other tools that use similar underlying technology in more useful and less problematic ways (see § Potential exceptions). Reviewing these tools for use in Servo should be a community-driven process.
We should not punish contributors for honest mistakes, but we should make our policy easier to follow. Some ways to do this include documenting the tools that are known to be allowed and not allowed, documenting how to turn off features that are not allowed, and giving contributors a way to declare that they’ve read and followed the policy.
The declaration would be a good place to provide a dated link to the policy, giving contributors the best chance to understand the policy and knowingly follow it (or violate it). This is not perfect, and it won’t always be easy to enforce, but it should give contributors and maintainers a foundation of trust.
Potential exceptions
Proposals for exceptions should start in the community, and should focus on a specific tool used for a specific purpose. If the proposal is for a specific kind of tool, it must come with concrete examples of which tools are to be allowed. Much of the harm being caused by generative AI in the world around us comes from people using open-ended tools that are not fit for any purpose, or even treating them like they are AGI.
The goal of these discussions would be to understand:
- the underlying challenges faced by contributors
- how effective the tool is for the purpose
- how well the tool and purpose mitigate the issues in the policy
- whether there are any existing or alternative solutions
- whether those solutions have problems that need to be addressed
Sometimes the purpose may need to be constrained to mitigate the issues in the policy. Let’s look at a couple of examples.
For some tasks like speech recognition [b] and machine translation [c] [d], tools with large language models and transformers are the state of the art (other than humans). This means those tools may be probabilistic tools, and strictly speaking, they may be generative AI tools, because the models they use are generative models. Generative AI does not necessarily mean “AI that generates bullshit in inscrutable ways”.
Speech recognition can be used in a variety of ways. If plumbed into ChatGPT, it will have all of the same problems as ChatGPT. If used for automatic captions, it can make videos and calls accessible to people that can’t hear well (myself included), but it can also infantilise us by censoring profanities and make serious errors that cause real harm. If deployed for that purpose by an online video platform, it can undermine the labour of human transcribers and lower the overall quality of captions.
If used as an input method, it would be a clear win for accessibility. My understanding of speech input tools is that they have a clear (if configurable) mapping from the things you say to the text they generate or the edits they make, so they may be a good fit.
In that case, maintainer burden and correctness and security would not be an issue, because the author is in complete control of what they write. Copyright issues seem less of a concern to me, since these tools operate on such a small scale (words and symbols) that they are unlikely to reproduce a copyrightable amount of text verbatim, but I am not a lawyer. As for ethical issues, these tools are generally trained once then run on the author’s device. When used as an input method, they are not being used to undermine labour or justify layoffs. I’m not sure about the process of training their models.
Machine translation can also be used in a variety of ways. If deployed by a language learning app, it can ruin the quality of your core product, but hey, then you can lay off those pesky human translators. If used to localise your product, your users will finally be able to compress to postcode file. If used to localise your docs, it can make your docs worse than useless unless you take very careful precautions. What if we allowed contributors to use machine translation to communicate with each other, but not in code commits, documentation, or any other work products?
Deployed carelessly, they will waste the reader’s time, and undermine the labour of actual human translators who would otherwise be happy to contribute to Servo. If constrained to collaboration, it would still be far from perfect, but it may be worthwhile.
Maintainer burden should be mitigated, because this won’t change the amount or kind of text that needs to be reviewed. Correctness and security too, because this won’t change the text that can be committed to Servo. I can’t comment on the copyright issues, because I am not a lawyer. The ethical issues may be significantly reduced, because this use case wasn’t a market for human translators in the first place.
Your feedback
I appreciate the feedback you gave on the Fediverse, on Bluesky, and on Reddit. I also appreciate the comments on GitHub from several people who were more on the favouring side of the proposal, even though we reached different conclusions in most cases. One comment argued that it’s possible to use AI autocomplete safely by accepting the completions one word at a time.
That said, the overall consensus in our community was overwhelmingly clear, including among many of those who were in favour of the proposals. None of the benefits of generative AI tools are worth the cost in community goodwill [e].
Much of the dissent on GitHub was already covered by our existing policy, but there were quite a few arguments worth highlighting.
Speech-to-text input is ok [f] [g].
Machine translation is generally not useful or effective for technical writing [h] [i] [j]. It can be, if some precautions are taken [k]. It may be less ethically encumbered than generative AI tools [l]. Client-side machine translation is ok [m]. Machine translation for collaboration is ok [n] [o].
The proposals. Proposal 1 is ill-defined [p]. Proposal 2 has an ill-defined distinction between autocompletes and “full” code generation [q] [r] [s].
Documentation is just as technical as code [u]. Wrong documentation is worse than no documentation [v] [w] [x]. Good documentation requires human context [y] [z].
GitHub Copilot is not a good tool for answering questions [ab]. It isn’t even that good of a programming tool [ac]. Using it may be incompatible with the DCO [ad]. Using it could make us depend on Microsoft to protect us against legal liability [ae].
Correctness. Generative AI code is wrong at an alarming rate [af]. Generative AI tools will lie to us with complete confidence [ag]. Generative AI tools (and users of those tools) cannot explain their reasoning [ah] [ai]. Humans as supervisors are ill-equipped to deal with the subtle errors that generative AI tools make [aj] [ak] [al] [am]. Even experts can easily be misled by these tools [an]. Typing is not the hard part of programming [ao], as even some of those in favour have said:
If I could offload that part of the work to copilot, I would be left with more energy for the challenging part.
Project health. Partially lifting the ban will create uncertainty that increases maintainer burden for all contributions [ap] [aq]. Becoming dependent on tools with non-free models is risky [ar]. Generative AI tools may not be fair use [as] → [at]. Outside of Servo, people have spent so much time cleaning up after LLM-generated mess [au].
Material. Servo contributor refuses to spend time cleaning up after LLM-generated mess [av]. Others will stop donating [aw] [ax] [ay] [az] [ba] [bb] [bc] [bd] [be] [bf] [bg], will stop contributing [bh], will not start donating [bi], will not start contributing [bj] [bk], or will not start promoting [bl] the project.
Broader context. Allowing AI contributions is a bad signal for the project’s relationship with the broader AI movement [bm] [bn] [bo]. The modern AI movement is backed by overwhelming capital interests, and must be opposed equally strongly [bp]. People often “need” GitHub or Firefox, but no one “needs” Servo, so we can and should be held to a higher standard [bq]. Rejection of AI is only credible if the project rejects AI contributions [br]. We can attract funding from AI-adjacent parties without getting into AI ourselves [bs], though that may be easier said than done [bt].
On governance
Several people have raised concerns about how Servo’s governance could have led to this decision, and some have even suspected foul play. But like most discussions in the TSC, most of the discussion around AI contributions happened async on Zulip, and we didn’t save anything special for the synchronous monthly public calls. As a result, whenever the discussion overflowed the sync meeting, we just continued it internally, so the public minutes were missing the vast majority of the discussion (and the decisions). These decisions should probably have happened in public.
Our decisions followed the TSC’s usual process, with a strong preference for resolving disagreements by consensus rather than by voting, but we didn’t have any consistent structure for moving from one to the other. This may have made the decision process prone to being blocked and dominated by the most persistent participants.
Contrast this with decision making within Igalia, where we also prefer consensus before voting, but the consensus process is always used to inform proposals that are drafted by more than one person and then always voted on. Most polls are “yes” or “no” by majority, and only a few polls for the most critical matters allow vetoing. This ensures that proposals have meaningful support before being considered, and if only one person is strongly against something, they are heard but they generally can’t single-handedly block the decision with debate.
We also didn’t have any consistent structure for community consultation, so it wasn’t clear how or when we should seek feedback. A public RFC process may have helped with this, and would also help us collaborate on and document other decisions.
More personally, I did not participate in the extensive discussion in January and February that helped move consensus in the TSC towards allowing the non-code and Copilot exceptions until fairly late. Some of that was because I was on leave, including for the vote on the initial Copilot “experiments”, but most of it was that I didn’t have the bandwidth. Doing politics is hard, exhausting work, and there’s only so much of it you can do, even when you’re not wearing three other hats.