人工智能，还是不？

人工智能，还是不？
To AI or not to AI

原始链接: https://antropia.studio/blog/to-ai-or-not-to-ai/

## AI辅助开发：为期两周的实验本报告详细记录了一项为期两周的实验，探索在应用程序开发中完全依赖AI辅助，具体是构建一个简化的Facebook广告界面（“adbrew”）。尽管承认LLM的炒作，团队发现这次体验令人沮丧且最终效率低下。尽管进行了大量的提示工程和工作流程调整，使用了Claude Code，但AI始终在上下文、可维护性和准确性方面遇到困难。它经常做出假设，捏造API参数，并重复代码而不是重用现有组件。团队发现他们花费在审查和纠正AI生成的代码上的时间*比*自己编写代码的时间更多，这阻碍了进展并暴露了隐藏的错误。该实验表明，虽然AI在特定任务上表现出色——代码片段生成、测试辅助、语言编辑和强大的搜索——但在整体问题解决和维护代码质量方面有所不足。团队更喜欢使用AI来*审查*他们的工作，而不是生成它，从而保持控制并确保彻底性。目前，团队将继续将AI用于这些专注的应用，但预计在技术从根本上改进之前，不会采用完全AI驱动的开发工作流程。

## AI 辅助：有帮助，但并非万能药最近的 Hacker News 讨论强调了使用 LLM 进行编码任务的令人沮丧的现实。虽然前景可期，但当前的 AI 工具即使在简单的命令上也会出错，尤其是在需要特定版本知识时。用户发现 ChatGPT 反复生成错误的 `rsync` 命令，因为它不了解他们的 macOS 版本，尽管尝试提供了上下文。核心问题似乎在于 LLM 依赖于从训练数据中识别模式，而不是真正的理解。它们擅长数据集中表现良好的任务，但在具体细节或最新变化方面却难以应对。讨论的解决方案包括向 LLM 提供完整的文档（如 man 页面），或在代码生成*之前*使用它们进行分析，提示问题和澄清。许多人认为 AI 最适合用作“实习生”——有助于自动化重复性任务或学习——但需要仔细监督和验证。它不能替代扎实的编码实践，例如测试和理解底层工具。最终，讨论强调了 AI 的价值在于*增强*人类技能，而不是取代它们，并且采取深思熟虑的方法对于避免浪费时间和潜在错误至关重要。

原文

To AI or not to AI

Sep 10, 2025

A two-week experiment building an app with full AI assistance, exploring both the promise and frustrations of LLM-based development workflows.

There is no denying that LLMs are the new kid on the block. Our marketing director (that’d be me) said that if we don’t write something about it, we will be left behind, so we ran an experiment, we went hardcore on it, and… TL;DR, we don’t fully buy it, yet.

What a way to discourage you from reading the article, huh? Well, the devil is in the details, so if you want to understand why we aren’t jumping on the full AI wagon yet, we recommend you read the post.

We have been brewing (strange choice of a word, you’ll understand why in a moment) a prototype for a couple of months with one of our dearest friends, Bernardo. While the two of us were helping my wife with her shop, we struggled with her Facebook Ads account. I don’t know if you ever had the opportunity to play with Facebook Ads, but our experience was quite terrible.

Facebook ads dashboard is pure chaos

We are not here to judge, though. In general, ads are one hell of a problem to solve, especially if you have the size of Facebook. One of the (possibly multiple) reasons the UI is so cluttered is that it’s trying to do everything. Filtering, reports, ad creation, etc. It’s pure chaos. That’s when we decided to make our own simplified version of Facebook Ads by using their API. Something that would help our use case and our use case only.

And so adbrew was born. What a great opportunity to uncover the full potential of (generative) AIs, or so we thought.

We started following some AI-related accounts and studying their workflows. Chose a solid tech stack (Remix a.k.a. React Router v7, not very happy with it, but that’s for another post) and subscribed to Claude Code.

We spent our first hours tweaking the prompts, setting all the usual DX tools in place (in hopes it’d help the AI) and started defining issues.

Our daily routine would become something like this:

Define issues.
Ask the AI to implement the issue at hand.
Back and forth with the AI, refining the requirements.
Review generated code in detail.
Commit code, push and deploy.
Repeat.

From time to time we’d refine the Guidelines file, added MCPs, automatic checks, and more.

This experiment lasted 2 weeks. And as days passed, we grew more and more frustrated with it. At first, we justified it by the fact that this was an entirely new way of (VIBE) coding, and we were just not used to it. But we kept tweaking the flow, adjusting expectations, trusting the process, and… the frustration would only grow.

We want to pinpoint some of the problems we had with it, in an attempt to fully understand if this is something that can be solvable in the future, if it’s inherent to vibing, or if it’s just us doing it wrong.

There is never enough context. We learned quickly that the more context we provided and the smaller the issues, the better the results. However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback. AI would just not understand if it didn’t have enough information to finish a task, it would assume, a lot, and fail.
No maintainability. We couldn’t make it abstract things away or reuse code at all. If we asked the AI to solve a task that was already partially solved, it would just replicate code all over the project. We’d end up with three different card components. Yes, this is where reviews are important, but it’s very tiring to tell the AI for the nth time that we already have a Text component with defined sizes and colors. Adding this information to the guidelines didn’t work BTW.
No flow. As the AI was solving an issue, we stared at the screen, trying to catch a mistake in their reasoning to stop it as soon as possible. We tried YOLOing as well, and keep writing issues at the same time, but we couldn’t find ~30-minute slots dedicated to one single task. This killed any momentum we tried to build.
Hallucinations. The Facebook API is a complex one by nature. On top of that, there are endpoints that are not documented (thanks to StackOverflow answers for clearing this up), and their SDK is poorly typed (They truly love Record<string, any>). If we mix this with the confidence of the AI, we have a recipe for disaster. It would make up parameters, endpoints and more. Now multiply that for every other framework/library/API we use (Tailwind, React Router, dayjs, pino, etc.).
Pareto is more present than ever. We like to think of tasks as trees. We’d start with a coarse-grained idea (the root) and turn that into more concrete ideas and tasks (the branches). It’s at that moment when corner cases, cross-feature interactions, transversal tasks (logging, tracking, etc.), and more appear. In our experiment, we lost the ability to uncover those, and so we ended up with something that looked like it worked, but was full of inconsistencies and bugs. It’s relatively easy to get the 80% of the solution done with the AI, but we still had to spend 80% of our time to make it truly work.

After these two weeks, we decided to stop trying. The code was getting larger and messy each day, and we were losing control of it. More importantly, we were not enjoying the process, and the results were simply not there.

We spent another two weeks back with our classic workflow, cleaning up when needed and marveling at the things we just missed in the reviews (not the AI to blame, but ourselves).

Adbrew dashboard showing general stats

We already use AI in our daily work. And we use it for many things:

Powerful search engine. One that, if it gets the response right, speeds up the search process and is even able to adapt the solutions to your specific context (and that’s mind-blowing by itself). But when it fails (and it often does), we can dismiss it quickly and go back to our regular approach of RTFM.
Rubber ducking. Throwing ideas and asking for alternative solutions, just to make sure we haven’t missed anything. One thing it is particularly good at is revealing keywords to deepen your research on certain topics. The results are much better if we look for “fibonacci lattices”, “geodesics” or “the golden spiral” rather than “ways to distribute points in a sphere”.
Code snippet assistant. The fifth time we write a chunkify, clamp or mapValues function it gets tiring. The AI can help with those tiny snippets we use in every project and make the rest of our work more enjoyable.
Tests. While we still have a word on the scope, techniques and libraries we use for tests, we let LLMs to write some of them for us. If only, to uncover scenarios we didn’t think of initially.
Language-related tasks. We use it as a copy editor for commit messages, posts, issues and PRs. In all of these use cases, we have actually reversed the relationship with the AI: we ask the AI to review our work, rather than us reviewing its work.

So we will keep using the AI, and we will keep favoring local LLMs over cloud ones in an attempt to keep control of our data, even when that is not always possible.

We just don’t think we will incorporate AI to do more than that, given the current state of things. We will, however, keep an eye in case the technology changes fundamentally.

人工智能，还是不？ To AI or not to AI

人工智能，还是不？
To AI or not to AI