无尽的AI生成维基百科

无尽的AI生成维基百科
Endless AI-Generated Wikipedia

原始链接: https://www.seangoedecke.com/endless-wiki/

## 无尽维基：探索AI生成的宇宙无尽维基 (endlesswiki.com) 是一个实验性的、AI驱动的维基，旨在探索大型语言模型中包含的知识。灵感来自博尔赫斯无限图书馆，该维基在用户点击链接时*按需*生成页面，有效地以可浏览的形式挖掘LLM的内部“知识”。该系统采用简单的架构——一个数据库和一个Golang服务器，通过Groq利用Kimi K2模型实现快速页面生成（小于半秒）。与类似项目不同，无尽维基是免费的，并且不需要注册。由于一名用户利用该系统，通过重复触发页面创建产生费用，创建者暂时禁用了新页面的生成。一个关键的设计元素是防止直接通过URL访问页面，强制用户通过链接进行导航，以确保真正探索AI内部的概念联系。该项目旨在展示与LLM交互的新颖方式，超越传统的聊天界面，并可能为自动超链接的AI回复等功能铺平道路。

## 无尽AI生成维基百科：简要总结 Seangoedecke.com 启动了一个由AI驱动、无尽生成的维基百科，但由于一名用户（或机器人）利用了系统，快速点击链接造成70美元的费用，因此暂时禁用了新页面创建。这一事件凸显了公共网站对自动化流量的脆弱性，即使超出恶意目的——搜索引擎爬虫也构成了类似的挑战。讨论的重点在于，该项目被与用于*训练*底层AI模型相同的抓取行为所破坏，这其中的讽刺意味。创建者随后重新启用了页面生成功能，并加入了速率限制和不同的模型（openai/gpt-oss-120b）。用户们争论该项目的价值，一些人认为它容易出现“幻觉”，并且不如人工编辑的维基百科有趣。另一些人则认为它有潜力成为一个巨大的知识库，尤其是在预训练LLM方面，并提出了改进建议，例如按IP地址限制速率和更深入的研究模式。几位用户在浏览生成的内容时遇到了死胡同和缺失的链接。

原文

edit: I’ve disabled new page generation for now because someone ran a script overnight to endlessly click links and cost me $70. I don’t really understand why anybody would do that.

I built an infinite, AI-generated wiki. You can try it out at endlesswiki.com!

Why build an AI-generated wiki?

Large language models are like Borges’ infinite library. They contain a huge array of possible texts, waiting to be elicited by the right prompt - including some version of Wikipedia. What if you could explore a model by interacting with it as a wiki?

The idea here is to build a version of Wikipedia where all the content is AI-generated. You only have to generate a single page to get started: when a user clicks any link on that page, the page for that link is generated on-the-fly, which will include links of its own. By browsing the wiki, users can dig deeper into the stored knowledge of the language model.

This works because wikipedias^{connect topics very broadly. If you follow enough links, you can get from any topic to any other topic. In fact, people already play a game where they try to race from one page to a totally unrelated page by just following links. It’s fun to try and figure out the most likely chain of conceptual relationships between two completely different things.}

In a sense, EndlessWiki is a collaborative attempt to mine the depths of a language model. Once a page is generated, all users will be able to search for it or link it to their friends.

Architecture

The basic design is very simple: a MySQL database with a pages table, and a Golang server. When the server gets a wiki/some-slug request, it looks up some-slug in the database. If it exists, it serves the page directly; if not, it generates the page from a LLM and saves it to the database before serving it.

I’m using Kimi K2 for the model. I chose a large model because larger models contain more facts about the world (which is good for a wiki), and Kimi specifically because in my experience Groq is faster and more reliable than other model inference providers. Speed is really important for this kind of application, because the user has to wait for new pages to be generated. Fortunately, Groq is fast enough that the wait time is only a few hundred ms.

Unlike AutoDeck, I don’t charge any money or require sign-in for this. That’s because this is more of a toy than a tool, so I’m not worried about one power user costing me a lot of money in inference. You have to be manually clicking links to trigger inference.

The most interesting design decision I made was preventing “cheating”. I’m excited to see how obscure the pages can get (for instance, can you get to eventually get to Neon Genesis Evangelion from the root page?) It would defeat the purpose if you could just manually go to /wiki/neon-genesis-evangelion in the address bar. To defeat that, I make each link have a origin=slug query parameter, and then I fetch the origin page server-side to validate that it does indeed contain a link to the page you’re navigating to^.

Final thoughts

Like AutoDeck, EndlessWiki represents another step in my “what if you could interact with LLMs without having to chat” line of thought. I think there’s a lot of potential here for non-toy features. For instance, what if ChatGPT automatically hyperlinked each proper noun in its responses, and clicking on those generated a response focused on that noun?

Anyway, check it out!