人工智能与忒修斯之船
AI and the Ship of Theseus

原始链接: https://lucumr.pocoo.org/2026/3/5/theseus/

## 软件许可的未来与AI驱动的重新实现 代码生成的成本降低,尤其是通过AI,正在挑战传统的软件许可模式。最近`chardet`库的事件就是一个例子:其维护者仅使用API和测试套件从头开始重新实现了该库,并将许可从LGPL改为MIT。这引发了争论,原作者认为这是一个衍生作品,而维护者声称是全新实现。 这凸显了一个关键转变——即使是带有copyleft许可(如GPL)的开源代码,也可以被轻松重写。像`readline`甚至`bash`的重新实现项目就证明了这种可行性。核心问题不仅仅是法律上的,更是哲学上的:随着AI促进了毫不费力的重新创作,版权是否还能被执行? 作者认为这种趋势将导致软件以更宽松的许可形式重新出现,甚至以专有软件的形式出现,并建议关注商标而不是许可。虽然承认可能存在冲突,但他们认为这是一个积极的发展,赞成开放共享而非限制性许可,并预计会出现更多“slopforks”(AI辅助的重新实现),以及随之而来的法律辩论。

## AI、许可和代码所有权 - Hacker News 讨论总结 一场 Hacker News 讨论围绕着 AI 生成代码和软件许可的复杂影响,起因是一篇关于 AI 和“忒修斯之船”思想实验的文章。 核心争论在于,由基于 GPL 许可代码训练的大型语言模型 (LLM) 生成的代码是否固有地继承了该许可。一些人认为,如果 LLM 利用 GPL 代码,任何生成的输出*都应该*是 GPL,从而有效地迫使专有软件开源。另一些人则反驳说,基于功能(如 *Compaq v. IBM* 案例)的重新实现是合理使用,只有实质上相似的代码才侵犯版权。 一个关键点是,由于不确定性,开源许可选项实际上正在崩溃——倾向于宽松的 MIT 许可或完全专有模式。人们对那些轻视许可条款表示沮丧,并认识到当前的法律环境尚不明确,需要通过可能的先例来澄清。 讨论还涉及相关项目,如 Toybox,强调了过去的许可争议。
相关文章

原文

written on March 05, 2026

Because code gets cheaper and cheaper to write, this includes re-implementations. I mentioned recently that I had an AI port one of my libraries to another language and it ended up choosing a different design for that implementation. In many ways, the functionality was the same, but the path it took to get there was different. The way that port worked was by going via the test suite.

Something related, but different, happened with chardet. The current maintainer reimplemented it from scratch by only pointing it to the API and the test suite. The motivation: enabling relicensing from LGPL to MIT. I personally have a horse in the race here because I too wanted chardet to be under a non-GPL license for many years. So consider me a very biased person in that regard.

Unsurprisingly, that new implementation caused a stir. In particular, Mark Pilgrim, the original author of the library, objects to the new implementation and considers it a derived work. The new maintainer, who has maintained it for the last 12 years, considers it a new work and instructs his coding agent to do precisely that. According to author, validating with JPlag, the new implementation is distinct. If you actually consider how it works, that’s not too surprising. It’s significantly faster than the original implementation, supports multiple cores and uses a fundamentally different design.

What I think is more interesting about this question is the consequences of where we are. Copyleft code like the GPL heavily depends on copyrights and friction to enforce it. But because it’s fundamentally in the open, with or without tests, you can trivially rewrite it these days. I myself have been intending to do this for a little while now with some other GPL libraries. In particular I started a re-implementation of readline a while ago for similar reasons, because of its GPL license. There is an obvious moral question here, but that isn’t necessarily what I’m interested in. For all the GPL software that might re-emerge as MIT software, so might be proprietary abandonware.

For me personally, what is more interesting is that we might not even be able to copyright these creations at all. A court still might rule that all AI-generated code is in the public domain, because there was not enough human input in it. That’s quite possible, though probably not very likely.

But this all causes some interesting new developments we are not necessarily ready for. Vercel, for instance, happily re-implemented bash with Clankers but got visibly upset when someone re-implemented Next.js in the same way.

There are huge consequences to this. When the cost of generating code goes down that much, and we can re-implement it from test suites alone, what does that mean for the future of software? Will we see a lot of software re-emerging under more permissive licenses? Will we see a lot of proprietary software re-emerging as open source? Will we see a lot of software re-emerging as proprietary?

It’s a new world and we have very little idea of how to navigate it. In the interim we will have some fights about copyrights but I have the feeling very few of those will go to court, because everyone involved will actually be somewhat scared of setting a precedent.

In the GPL case, though, I think it warms up some old fights about copyleft vs permissive licenses that we have not seen in a long time. It probably does not feel great to have one’s work rewritten with a Clanker and one’s authorship eradicated. Unlike the Ship of Theseus, though, this seems more clear-cut: if you throw away all code and start from scratch, even if the end result behaves the same, it’s a new ship. It only continues to carry the name. Which may be another argument for why authors should hold on to trademarks rather than rely on licenses and contract law.

I personally think all of this is exciting. I’m a strong supporter of putting things in the open with as little license enforcement as possible. I think society is better off when we share, and I consider the GPL to run against that spirit by restricting what can be done with it. This development plays into my worldview. I understand, though, that not everyone shares that view, and I expect more fights over the emergence of slopforks as a result. After all, it combines two very heated topics, licensing and AI, in the worst possible way.

This entry was tagged ai, licensing and python

联系我们 contact @ memedata.com