两个AI失败的故事：用LLM调试一个简单的错误

两个AI失败的故事：用LLM调试一个简单的错误
A Tale of Two AI Failures: Debugging a Simple Bug with LLMs

原始链接: https://bitmovin.com/blog/hackathon-debugging-ai-tools-llms/

在一次Bitmovin黑客马拉松中，尝试集成太阳能发电数据API时，发现AI编码助手存在令人惊讶的局限性。尽管使用了领先的工具——Cursor和Claude，但一个简单的字符串格式化错误却无法解决。该API需要在签名字符串中处理特定的换行符，但两个AI始终默认使用标准连接，导致请求失败。 Cursor表现出“静默失败”，无休止地提出不相关的修复建议，却始终卡在错误的逻辑上。Claude则相反，自信地*幻觉*出一个完全错误的问题——未来的系统时钟，完全转移了调试方向。这两个工具都能处理复杂的哈希组件，但在精确的字符串结构上却表现不佳。这次经历凸显了LLM擅长模式匹配，但在处理细微API集成所需的关键阅读和字节级精度方面存在困难。虽然它们是强大的助手，但在面对非标准需求时容易重复错误，甚至可以自信地呈现虚构的解决方案。这个项目强调了持续需要人类开发者，以及简单地打印问题字符串所带来的宝贵调试能力。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录两段人工智能失败的故事：使用 LLM 调试一个简单的错误 (bitmovin.com) 4 分，slederer 发表于 1 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 kichik 发表于 12 分钟前 [–] 这并非本文的重点，但如果像这样的 API 可以返回预期的签名字符串用于调试，那就太好了。这需要妥善的安全限制。但如果 API 期望非标准签名，它可以帮助开发者获得更好的调试工具。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

During a recent internal Bitmovin hackathon focused on experimenting with AI tools, I decided to work on a project I had been wanting to explore, even though it was outside of our usual video focus. I gave myself a simple solo project that I thought would be a great way to test modern AI coding assistants: integrate an API that returns solar generation data. It turns out what should have been a straightforward integration turned into a two day reminder of how easily AI can fail.

Two leading tools, Cursor and Claude, both hit the same tiny string formatting bug and neither could get past it. The outcome was that they were defeated by the exact same task, but in completely different ways. One ran into a silent logical wall, while the other dramatically hallucinated a completely false solution.

The Shared Battlefield: A Hyper-Specific Signature

My objective was to interface with the FoxESSCloud platform. The core hurdle was generating a unique signature for every request, a standard practice in proprietary APIs to ensure request authenticity and to prevent tampering.

This signature is produced by taking a concatenated string of five critical request parameters:

HTTP method (POST)
API path
Unique auth token,
Timestamp
JSON request body

Then you run that final string through an HMAC-SHA256 hash function. The difficulty lay entirely in the preparation of the input string, not in the hashing itself.

The Stumbling Block: The Concatenation Trap

The API documentation required the string to be concatenated using newline characters (\n). However, the API was expecting the newlines to be handled as literal characters within certain parts of the string, and not simply as concatenation operators. This created a massive blind spot for both AI tools, as shown in the examples below.

Status	Pseudo-Code Generated by AI (Wrong)	The Required Format (Right)
Problem	The AI-generated code often used concatenation operators (+ “\n” +) to build the string, resulting in the “illegal signature” error.	The API required the newlines to be included as literals within the string structure itself for the first segment of the string.
Example	String signature = “POST” + “\n” + “/api/v1/query” + “\n” + token + “\n” + timestamp + “\n” + “{“body”:”content”}”	String signature = “POST\n/api/v1/query\n” + token + “\n” + timestamp + “\n” + “{\”body\”:\”content\”}”

No matter how I prompted them, both AI tools stayed locked on the version on the left, and the API refused every request until I switched to the format on the right

Day 1: Cursor’s Silent Failure (The Logical Dead-End)

I started with Cursor, the AI-powered editor, feeding it the API documentation and error logs.

Cursor’s approach was methodical but ultimately circular. It correctly identified the part of the code responsible for the hash generation, but it lacked the critical insight to challenge the input string’s construction. I spent hours debugging with it, and its suggestions revolved around changing the encoding or the hashing library, which are all standard boilerplate fixes that were incorrect.

Cursor’s failure was one of logical stubbornness. It would not deviate from its initial, flawed concatenation pattern, making it a technical dead-end. The error was always the same: “illegal signature.”

Day 2: Claude’s Dramatic Failure (The Confident Hallucination)

Frustrated with Cursor, I switched to Claude on Day 2 to get a fresh perspective on the logs. Claude was immediately more conversational and engaging, which at first made it feel more helpful, but its output was even more misleading.

When presented with the failing code and the “illegal signature” error, Claude was unable to identify the simple string concatenation bug that Cursor had also missed. Instead, it diverted the entire debugging process by dramatically announcing a breakthrough.

The Story of the Wrong Time

While I was feeding it logs and error messages, Claude seized on the timestamp parameter, confidently declaring:

FOUND IT! The timestamp is showing 2025-11-18 but the actual current date is 2024-11-18. Your system clock is set exactly one year in the future! The FoxESS API is rejecting the requests because the timestamp is in the future… Please fix your system clock.

This was a Red Herring of the highest order. It sent me down a completely baseless tangent; I immediately checked my system clock, and it was perfectly correct. Claude had completely hallucinated a complex, plausible system-level problem (time drift) to explain the error, rather than addressing the actual bug in the code. It swapped Cursor’s quiet inability to solve the issue for a confident, authoritative explanation that was entirely false.

The Unsolved Problem

After correcting the initial timestamp tangent, I was back at square one. I explicitly asked Claude to fix the string format, and, just like Cursor, it generated the flawed concatenation highlighted in the previous section.

The critical takeaway: Two distinct, high-powered AI coding tools were simultaneously defeated by a single, subtle formatting requirement in an API integration. They could perform the complex HMAC hashing, but they could not master the necessary string structure.

Conclusion: The New Rules of AI-Assisted Coding

My hackathon project ended not with a data visualization, but with a critical lesson on the state of LLMs in development:

AI Shares Blind Spots: LLMs are powerful pattern matching systems. If a common pattern (like string + “\n” + string}) is the wrong solution for a highly specific API, both models are likely to repeat the mistake. They lack the ability to truly read documentation critically and apply byte-level precision.
The Contrast in Failure: Cursor failed silently, trapped by its initial logic. Claude failed dramatically, compounding the actual bug with a confident, fabricated system error. The hallucination proved to be the more disruptive, time-wasting error mode.

AI is a powerful coding assistant, but for subtle, context-heavy, and non-standard parts of coding, where literal truth is paramount, the human developer, armed with a print(signature_string) command, is still the superior debugger.