罗科的跳舞的巴斯里斯克
Roko's Dancing Basilisk

原始链接: https://boston.conman.org/2025/12/02.1

## DeepWiki 与 LLM 生成的文档:好坏参半 作者尝试了 DeepWiki,这是一款利用 LLM 自动为 GitHub 仓库生成文档的工具,从一个 26 年前的个人项目 `mod_blog` 开始。初步结果令人惊讶地不错——近 30 页涵盖了存储和工作流程等关键方面。然而,存在一些不一致之处,例如,一个图表显示了五层,但却声称只有三个“主要”层。网站界面本身也因笨拙的菜单、不一致的图表大小和过多的重复而受到批评。 使用更复杂的 6809 汇编器项目 `a09` 进行测试,结果显示错误明显增多。这些错误包括分类错误、代码解释不正确,以及关键表格中完全捏造的信息。作者怀疑这源于项目的更高复杂度超过了 LLM 的上下文窗口。 虽然比尝试用 LLM 生成代码要好,但该文档对于不熟悉的代码库来说过于不准确。为了保持准确性,需要随着每次代码更改进行持续更新——这可能比直接在代码中编写注释需要更多精力。作者得出结论,该工具显示出潜力,但尚未准备好广泛使用,尤其是在大型、遗留项目上,需要大量的手动审查和维护。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 罗科的跳舞的巴斯里斯克 (conman.org) 4 点赞 by todsacerdoti 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Roko's dancing basilisk

I came across a reference to DeepWiki, a site that will generate “documentation” for any Github repository. I can't say I've been impressed with LLMs generating code, but what about documentation? I haven't tried that yet. Let's see how well Roko's basilisk dances!

Intially, I started with mod_blog. I've been working with the codebase now for 26 years so it should be easy for me to spot inaccuracies in the “documentation.” Even better—there's no interaction with a sycophantic chat bot; just plop in the URL for the repo, supply an email for notification when it's done and as the Brits say, “Bob's your uncle!”

Anyway, email came. I checked, and I was quickly amazed! Nearly 30 pages of documentation, and the overview was impressive. It picked up on tumblers, the storage layout, the typical flows in adding a new entry. It even got the fact that cmd_cgi_get_today() returns all the entries for a given day of the month throughout the years. But there was one bit that was just a tad bit off. It stated “[t]he system consists of three primary layers” but the following diagram showed five layers, which no indication of what three were the “primary layers.” I didn't have a problem with the layers it did identify:

  • Entry Layer
  • Processing Layer
  • Rendering Layer
  • Storage Layer
  • Configuration

Just that it seems to have a problem counting to three.

Before I get into a review of the rest of the contents, I'll mention briefly my opinions on the web site as interface: it's meh. The menu on the left is longer than it appears, given that scroll bars seem oh so last century (really! I would love to force “web designers” to use old-fasioned three-button mice and a monitor calibrated to simulate color-blindness, just to see them strugge with their own designs; not everyone has a mouse with a scroll-wheel, nor an Apple Trackpad). Also, the diagrams are very inconsistent, and often times, way too small to view properly, even when selected. Then you'll get the occasionally gigantic diagram. The layouts seem arbitrary—some horizontal, some vertical, and some L-shaped.

And it repeats itself excessively. I can maybe understand that across pages, saving a person excessive navigation, but I found it repeating itself even on a single page.

Other than those issues, it's mostly functional. Even with Javascript off, it's viewable, even if the diagrams are missing and the contrast is low.

One aspect I did like are the links at the end of each section refering to the source. That's a nice touch.

So with that out of the way—the “documentation” itself.

Mostly correct. I have a bunch of small quibbles:

  1. examples of running it on the command line don't need the –config open if $BLOG_CONFIG is set;
  2. $BLOG_CONFIG isn't checked in main.c but in blog.c;
  3. mod_blog outputs RSS 0.91, not RSS 2.0;
  4. “The system is written entirely in C and does not have Perl, Python or other scripting dependencies for the core engine itself.” Perhaps true? I mean, I do use Lua, but only for the configuration file;
  5. missed out how SUID is used (not for root to run, but as the owner of the blog);
  6. the posthook script returning failure doesn't mean the entry wasn't added, it just changes the HTTP status code returned.

I also found two problematic bits of code when reviewing this “documentation”—one is an actual bug in the code (the file locking diagram, while acurate to the code, made a caching issue stand out) and another one where I used a literal constant instead of a defined constant. At least I'm glad for finding those two issues, even if they haven't been an actual exploitable bug yet (as I think I'm the only one using mod_blog).

In the grand scheme of things, not terrible for something that might have taken 10 minutes to generate (I'm not sure—I did other things waiting for the email to arrive).

But one repo does not a trend make. So I decided upon doing this again with a09, my 6809 assembler. It's a similar size (mod_blog is 7,400 lines, a09 is 9,500—same ballpark) but it's a bit more complicated in logic and hasn't had 26 years of successive refinement done on it. As such, I found way more serious issues:

  1. Errors aren't classified. Errors are created as needed, sequentially. I make no attempt to bunch error codes into fixed ranges.
  2. It missed a key element of the dead code detection—it only triggers if the following instruction doesn't have a label.
  3. The listing file isn't kept in the presence of errors.
  4. It also got the removal of generated output files incorrect—they're only deleted if an error was detected on pass 1 or 2, not if a test failed.
  5. It repeats the precedence table on the same page.
  6. I do not have “Unsupported markdown: blockquote” or “Unsupported markdown: list” unary operators.
  7. Oh my God! I can't say how bad this backend matrix table is. It's all sorts of wrong. It's not that it got the supported/non-supported markers backwards, it appears to have just made up the results! And the same information on another page is bad as well. Not as bad as the first, but that's like saying bronchitus is not as bad as pneumonia. Both are bad. And it uses a different format for both tables. Consistency for the win! Sheesh.
  8. The example of writing an instruction to the various formats is wrong for the RS-DOS version—the type and length should be two bytes each, not one.
  9. The output format for -t is incorrect—it doesn't show a trace of the code being run unless the TRON directives are in use.
  10. Every example of the .ASSERT directive is just wrong as it did not use the proper register references, and memory dereferences need a @ (8-bit) or @@ (16-bit) prefix.
  11. Where you can use the .TRON direcive is wrong—it can be used anywhere; it's .OPT TEST TRON that can only be used inside a .TEST directive.

This, in my mind, is a much worse job than it did for mod_blog. I suspect it's due to the cyclomatic complexity being a bit higher in a09 than in mod_blog due to the cross-cutting nature of the code. And that probably causes the LLM to run up to, if not a bit over, it's context window, thus causing the confabulations.

I fear that is is meant to be used for legacy code with little or no documentation, and if it does this poorly on a moderately complex but small code base, I don't want to contemplate what it would do for a larger, older, and gnarlier codebase. I'd be up to try it, and I have a code base of 155,000 lines of C code written in the early 90s that's as gnarly as it gets, but I'm not that familiar with the codebase to feel confident that I can spot all the glaring errors, much less the more subtle issues.

Another issue are updates to the repo. The site sells itself as a wiki, so I suppose another aspect to this is you spend the time going through the generated “documentation” and fixing the errors, and then keep it up to date as the code changes. It's not obvious to me if one can rerun this over a changed repo, and if so, are the updates merged into the existing documentation? Replaced outright and you have to go through fixing the documentation again? I suspect this generated “documentation” will end up worse than bad comments in the code itself.

mod_blog has changed drastically over the years, and while the storage format itself hasn't, how it works internally has. There were at least three to four major revisions to the code base over the years. How major? One was a nearly a complete rewrite to remove a custom IO layer I had to using C's FILE *-style I/O about 18 years ago. Another one was removal of all global variables about three years ago. And for the past year, I've been removing features that I don't use. That's a lot of documentation to rewrite every few years.

Overall, this was less obnoxious than having the LLMs write code, but I feel it's still too inaccurate to be let loose on unfamiliar codebases, which I suspect is the selling point.


Discussions about this entry

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

联系我们 contact @ memedata.com