使用人工智能:一个具体的例子
Working With AI: A concrete example

原始链接: https://htmx.org/essays/working-with-ai/

Carson Gross 反思了他与人工智能之间矛盾的关系,并以其项目 *hyperscript* 最近的一次错误修复为例,说明了 AI 的两面性。 尽管 Claude 在定位解析回归(parsing regression)的根本原因和生成全面测试用例方面表现出色,但在提供架构合理的解决方案时却显得力不从心。AI 最初的建议往往是“投机取巧”的,引入了不必要的技术债务。Gross 避免了“魔法师的学徒”式的问题(即开发者在不理解系统的情况下盲目依赖 AI),他利用自己对 *hyperscript* 的深厚理解,引导 AI 转向解析器中更简洁、现有的机制。 Gross 总结道,AI 是一个强大的力量倍增器,尤其是对于那些在记忆力或精力方面可能遇到瓶颈的资深开发者而言。然而,他警告称,过度依赖 AI 可能导致“智力迟钝”。高效开发的关键在于保持“人在回路”的方法:利用 AI 进行繁重的调查和测试工作,同时保留必要的专业知识,以要求并验证优雅的解决方案。他认为,开发者必须充当监督 AI 的“魔法师”,而不是为了快速修复而牺牲系统完整性的被动“学徒”。

这篇 Hacker News 上的讨论聚焦于《htmx》作者撰写的一篇文章,探讨了使用 AI(特别是 Claude)进行编程任务的优势与局限。 评论中的核心观点包括: * **设计局限性:** 批评者认为,尽管大语言模型(LLM)擅长处理样板代码和分析,但它们缺乏“世界模型”,难以进行高层架构思考。这种急于采用次优方案的倾向,可能会导致严重的长期技术债务。 * **测试的重要性:** 评论者建议,如果模型在实现代码前能更好地进行规划并编写稳健、智能的测试,或许能缓解其设计失误的问题。 * **认知影响:** 关于过度依赖 AI 是否会削弱人类智力,各方仍存在争议。一些人认为大脑具有可塑性,将 AI 作为工具更多是关于效率,而非技能退化。 * **网站批评:** 用户指出了一个讽刺之处:新的《htmx》网站使用了复杂的 JavaScript 生态系统(Astro),这与其极简、优先考虑 HTML 的初衷相悖。作者解释称这是团队主导的一次实验,并强调尽管 AI 可能让复杂的架构更易于管理,但核心网络标准的简洁性依然至关重要。
相关文章

原文
Carson Gross

I am, generally, ambivalent towards AI. There is no doubt it has become a very powerful tool for development in the last year, but it also comes with many dangers, both for us individually (e.g. the slow dulling of our intellects) as well as collectively (e.g. environmental concerns, increasingly expensive personal computing, etc.)

In “Code is Cheap(er)”, I warn about The Sorcerer’s Apprentice problem, where a developer becomes reliant on AI and is unable to understand and properly address issues that come up in the systems they are building.

In this article I want to go through a specific interaction that I had with AI while maintaining hyperscript to show the strengths and weaknesses of AI in general and to demonstrate The Sorcerer’s Apprentice problem (which I narrowly avoided) in particular.

For some background, hyperscript is an alternative interpreted scripting language for the web. It is, ironically, written entirely in JavaScript.

It is a strange piece of software: I intentionally broke many of the rules of parsing when writing it as an experiment to see how things would work out.

Some examples:

It is not an approach I would recommend for most programming languages, but it has worked out pretty well for this project.

Yet another demonstration that there are indeed multiple ways to skin the cat in software.

Our story begins when a user reported a regression when upgrading to the 0.9.91 release. The following expression no longer parsed properly:


fetch `{% url 'trade:get_symbol_data' %}?symbol=${symbol}` as JSON

In particular, the as JSON was binding too tightly and trying to convert the string literal into JSON before it was handed to fetch instead of doing what the user expected (and what it did previously) namely fetching the given url with the results treated as JSON.

This sort of binding conflict is a classic problem in parsing.

Because hyperscript is an xTalk style language and inherits many of the ambiguities of English, this problem is all the worse in it.

The first thing to do was to investigate why this regression occurred.

This is an area where I am typically going to lean on AI to help.

I use Claude, and it did an admirable job finding the root cause: in 0.9.91 I had been overly aggressive in refactoring the go command to reuse/share logic with the fetch command.

I had extracted a common method for both of these commands to use, parseURLOrExpression(), but, in doing so, I accidentally expanded the grammar after the fetch command to include the general expression, er, expression.

The as keyword has a meaning in expressions: it is a conversion expression, allowing you to convert between types:

  set x to "42" as Int

But the as keyword is also a modifier of the fetch command, telling it how to convert the response:

  fetch https://hyperscript.org as Text

(Perhaps this fact makes you throw up a little bit in your mouth. Good.)

The crux of the issue was that, inadvertently in the refactor, I had made the parser parse an expression after a fetch keyword which was now consuming the as keyword as an expression, rather than allowing it to be a modifier for fetch.

With the help of Claude I was able to figure this out in a few minutes, much faster than if I had had to figure it out on my own.

AI was very helpful in finding the cause of the problem.

In fixing the problem, however, it was much weaker.

I will admit here I was being lazy and asked AI for a solution, so complaining about those solutions feels a bit, well, lazy, but I still think the string of events is informative, so let’s go through exactly what happened.

The first suggestion that was given was to parse what is called a “string-like” leaf first, then fall back to a full expression:

return this.parseElement("stringLike") || this.requireElement("expression");

This fix would have solved the immediate problem presented by the user.

However, it was very specific to the reported bug and wouldn’t have fixed the general case, such as if someone uses a variable as the target of a fetch:

  fetch $url as JSON

I rejected this proposal because of this: too hacky and not general enough.

(Note that the hyperscript parser has plenty of organically supplied hacks in it, so this may have been the pot calling the kettle black.)

The second proposal was more interesting: add a noConversions flag on the parser, set it around the URL parse, and have AsExpression.parse bail when it is set:

// AsExpression.parse()
if (parser.noConversions) return;

This will horrify many parser engineers because it makes the hyperscript parser context-sensitive.

Good.

The hyperscript parser was already context-sensitive.

In looking at this fix and thinking for a second, I realized that we already had the hacky context-sensitive infrastructure we needed without introducing a new flag on the parser, but Claude had missed it.

“Follows” In The Hyperscript Parser

In the hyperscript parser we have a notion of “follows”, that is, tokens that are claimed by a “higher up” parse element as a follow token.

The hyperscript parser is (a somewhat strange) recursive descent parser, and this allows a parse element (usually a command) to “claim” a keyword, and expressions won’t match against them during parsing.

As an example, the when feature uses or as a separator rather than as a logical connective in its declaration:

<div _="when $x or $y changes put it into me"></div>

(I can hear many parser engineers closing this window in anger. Good.)

It turns out that this feature could be used to achieve what we wanted: rather than adding a new flag to the parser we could push as as a follow, then parse the expression, then pop it as a follow.

This would prevent the AsExpression from parsing, while still allowing most general expressions such as variables to work.

I pointed this out to Claude and, in a frisson of excitement, it told me that I was “absolutely right!” and set about using this technique to fix the bug.

Claude added the correct code to the parseURLOrExpression() which fixed the issue generally without adding any additional parser infrastructure.

Good to go.

However, as I was reviewing the change, I realized that the new fix was overly broad: both fetch and go shared this method, but only fetch used as to signal a modifier.

The existing fix prevented the perfectly valid use of as conversion expressions in go commands as well.

So I implemented the final fix myself, in FetchCommand#parse():

  parser.pushFollow("as");
  try {
    var url = parser.parseURLOrExpression();
  } finally {
    parser.popFollow();
  }
  
  if (parser.matchToken("as")) {
      ...

Here I narrowed the special case to only the fetch command, leaving go parsing unaffected.

This ended up being my final answer to the bug.

Along the way I had Claude generate some tests for the various cases.

There is a good existing test suite for hyperscript, and Claude did a good job of creating small, focused tests that showed the problem and that the fix was working properly.

Another area AI appears to work well.

OK, so what is interesting about this fairly mundane bug fix story?

I think it is interesting to see where AI did well, namely in investigation and test creation, and to contrast that with where it didn’t do so well: coming up with a clean solution.

If I had not been familiar with the hyperscript parser and its infrastructure this fix could have easily led to technical debt being accrued in the project: another hacky parsing corner case, another bit of state on the parser, etc.

Technical debt, I assert without evidence, grows exponentially, and therefpre it is very important to minimize it in your projects.

This story shows how having a human in the loop, working with an agent and with a good understanding of the underlying infrastructure, can be much more effective in controlling complexity than an agent left to its own devices.

Some people will look at the hyperscript code base and scoff at the notion that controlling complexity was ever a consideration at all. I am sympathetic to that view.

However, in this example we can see in a concrete scenario how complexity was restrained, at least a bit, in fixing an admittedly embarrassing bug, by a knowledgeable human working with an AI agent.

This is a situation where, rather than being a sorcerer’s apprentice and blindly accepting the solutions AI proposed, I was acting as a sorcerer (I hope that’s not too arrogant to say!) demanding a correct solution that better fit the existing codebase’s architecture.

I understood the problem and saw the correct solution and was able to work with AI to achieve it and then verify the solution with the help of AI-generated tests.

This is in contrast, I hope a good contrast, with some forms of vibe coding currently being pushed in which developers (or whatever) appear to pride themselves on not understanding what is actually going on.

Another thing occurred to me as I was going back over this experience.

I am an older developer, having turned 50 this year. As developers get older the reality is that we tend to “lose our fastball”, at least to some extent.

Practically, for me, this has meant two things:

  • I am not able to remember as much as I used to
  • I am not able to work as long of hours as I used to

It turns out that AI directly addresses both of these issues.

With respect to memory, while I can’t remember everything I used to be able to, I can understand things again very quickly with appropriate, er, prompting. AI is very good at helping me with this, and it lets me switch between open source projects and work projects much more efficiently than if I didn’t have it.

With respect to the long hours, AI is able to grind in a way that, even as a young developer, I would have had a difficult time keeping up with. This means, for example, I can have a much more extensive test suite for my projects than I would have otherwise.

Looking at the tests that Claude generated in this case, they are more extensive than what I probably could have mustered the energy to do myself.

So AI has addressed two fundamental (relative) weaknesses I have developed as an older developer.

On the other hand, I am very worried that it is also enabling a more general regression in my overall intelligence. This is something that occurs naturally as you age anyway. AI reliance may accelerate this process however and I have to say, looking back at this story, I’m a bit ashamed of how long I leaned on Claude before just doing the right thing darned myself.

This is an area I am still trying to navigate myself.

I wanted to write up this series of interactions because I thought it captured some of the good and some of the bad of AI assistance in coding. It demonstrated the value of a reasonably competent developer in the loop working with an AI agent, and also showed the danger of blindly accepting the first (or second) solution that an AI agent suggests to a problem.

I hope that it is useful to you as you develop your own thoughts and strategies around AI agents.

</>

联系我们 contact @ memedata.com