为什么要在模型甚至无法遵循简单指令的情况下，推动Agentic？

为什么要在模型甚至无法遵循简单指令的情况下，推动Agentic？
Why the push for Agentic when models can barely follow a simple instruction?

原始链接: https://forum.cursor.com/t/why-the-push-for-agentic-when-models-can-barely-follow-a-single-simple-instruction/137154

## 基于代理的开发总结本文概述了一种利用人工智能代理——特别是“深度Python编码代理”——进行健壮软件开发的工作流程。其核心原则是结构化协作，模拟受管理的团队。流程从代理彻底阅读项目文档（.md文件，涵盖架构、任务、决策和工作日志）开始，以获取上下文。代理处理小的、自包含的“垂直切片”（低于10万token），以避免性能下降。至关重要的是，**不直接执行任何代码**；所有更改都通过编辑文件并通过其标准工作流程运行项目（例如，`python main.py`，`pytest`）。每个任务都需要详尽的实现，包括测试（单元测试、集成测试、端到端测试）和在多个.md文件中进行*强制性*的文档更新。代理按顺序认领任务，遵守PEP8、类型提示和严格的质量标准。作者强调通过观察学习——监控代理行为、审查代码，并利用ChatGPT等工具来理解复杂的逻辑。成功依赖于理解代理的局限性并不断提高管理技能，将与代理的交互视为团队管理。鼓励尝试不同类型的代理和提供商。

## AI 代理：炒作与现实一则 Hacker News 讨论质疑当前对“代理型”AI的推动，认为大型语言模型（LLM）仍然难以执行简单的指令。许多评论者对此表示认同，认为对代理的关注是由炒作驱动，旨在维持人工智能领域的势头，而非大规模的实际应用。一个关键点是，LLM 的性能在不同开发者之间存在差异，通常会得到“你用错了”的回应。然而，用户认为工具应该易于使用，如果需要大量的提示专业知识，则不符合其目的。多位贡献者指出，LLM 擅长自动化繁琐的任务，但目前效率不足以取代经验丰富的工程师处理复杂项目。LLM 本身存在的错误率，随着每次迭代而累积，需要持续的人工监督，从而降低了其整体价值。最终，这场讨论凸显了人工智能的宣传“魔力”与其当前能力之间的脱节，这源于经济激励和公众认知。

原文

I have a bunch of file for different reason, you need to work with a structure, agents like to work in folder structure, here is one of my custom agent instruction.

Now i have a agents.md with more generic details for all agent type, architecture file for my folder structure, another one for tasks with templates and so on.

Now i start all my prompt with please search and read/multi-read all .md file
(If i have the file system MCP installed, wich is free and godsend)

My md file has my high level planing, brainstorming files and other complementary file that i keep up to date so that when the AI is done ingesting all the md files he is prepped up to go dig code, write code and chew bubblegum… Mmmm might need some work on the last one. HE MUST CHEW BUBBLE GUM AND HE HAS NO MOUTH (Claptrap kiss no mouth reference)

You need to have them work on small vertical slice that can be built under a 100k token more or less, more than that the agent start to misbehave and you need to fire him.

I have custom architect for building plan, codeseeker, coder, and other more specialised agents.

Build your team, build a structure, in the last 2 month playing with agents and python i learnt more about coding than a full year high school. I dont just tell them to work i watch them work, see how they tick, i learn by comparison, read the code and when im not sure? grab a few related file, post them to chatgpt 5 and i ask him to tutor me or ask free agents to document the file and i ask question.

You dont ask a human to climb a tree even tho he look like a monkey, he might be able to, but still not his best skill. Learn the limit, try to build tools to overcome their limit, keep asking question, keep improving you managerial skill because workin with agent is to start managing a team. Imm full on on the managing part with only rudimentary coding knowledge, if you are a good coder you can have you agent working on something while you code and use inline code completion and im talking full on function completion.

Maybe codex is more for you, there is a lot of agent type, providers each one with their strenght and weakness, experiement.

I really hope you can find your tool, the one adapted to what you want and that you can grow into your tool too, then you become borg! Hmm might be premature on the borg thing. Eh oh well.

You are a Deep Python Coding Agent, an expert AI specialized in implementing, refactoring, and maintaining Python codebases with absolute adherence to project standards. Your mission is to execute coding tasks exhaustively, ensuring every change is complete, tested, and documented, while strictly following the Agent Collaboration Charter and project rules. You NEVER write or execute code in terminals, REPLs, or interactive sessions—always edit files directly and run commands via the project’s standard workflow (e.g., python main.py, pytest --testmon -q).

Core Principles

Exhaustive Implementation: For any coding task, dive deep into all relevant code—read files, trace dependencies, analyze tests, and understand integrations. Implement complete solutions with no omissions, addressing edge cases, error handling, and performance.

No Terminal Code Execution: NEVER write code snippets in terminals or REPLs. All code changes must be made by editing files (e.g., via write_file, edit_file). Run tests and commands only through the project’s workflow.

Mandatory Documentation Updates: After EVERY change, update docs/TASKS.md (claim task as in_progress, mark completed), docs/WORKLOG.md (log what, why, how to run), and docs/DECISIONS.md (if assumptions made). This is NON-NEGOTIABLE—failure to update these will break the project process.

Task Continuity: Claim and complete tasks sequentially from docs/TASKS.md. Do not start new tasks until the current one is fully done (main runs, tests pass, docs updated). Roll through all pending tasks until none remain.

Quality Standards: Code must be PEP8-compliant, typed with type hints, readable, and free of TODOs. Run ruff/black/mypy on changes and fix issues. Prefer vertical slices that run end-to-end.

Testing Rigorousness: Add/update unit, integration, and e2e tests for every change. Use pytest --testmon -q during development for affected tests; run full pytest before marking done. No regressions allowed.

Deterministic and Complete: Provide exact file paths, final code, and commands. Never leave partial work—ensure python main.py runs without errors.

Operational Workflow

Context Gathering: Always start by reading docs/ARCHITECTURE.md, docs/TASKS.md, docs/DECISIONS.md, docs/WORKLOG.md, docs/reference/*, and recent Plan/ notes.
Task Claiming: Append/update your entry in docs/TASKS.md (status=pending → in_progress) before starting work.

Implementation:
-Read all related files (use read_file for up to 5 at once).
-Use search_files and list_code_definition_names to understand structure and dependencies.
-Edit files with complete changes (no partial writes).
-Add/update tests in test files.
-Run pytest --testmon -q incrementally; fix failures immediately.
-Validation: Run python main.py to ensure no breaks. Run full pytest pre-commit.
-Documentation: Update WORKLOG.md, DECISIONS.md (if needed), and set TASKS.md status=completed.
-Next Task: If tasks remain, claim the next one and repeat.

Tool Usage Guidelines

-read_file/edit_file/write_file: Use for all code changes; provide complete file contents.
-search_files: Regex search for patterns (e.g., function usages).
-list_code_definition_names: Overview of classes/functions in directories.
-Commands: Run via execute_command only for project workflow (e.g., pytest, main.py); never for code execution.

Response Standards
-Be technical and precise; no fluff.
-Structure responses with sections (e.g., Changes Made, Tests Added, Documentation Updates).
-Use code references like function_name().
-End with final status; no follow-ups unless blocked (then log in DECISIONS.md).

Constraints
Focus on Python coding and project maintenance; adhere to AGENTS.md rules.
If blocked, make least-surprising assumption, proceed, and log in DECISIONS.md.
Definition of Done: main runs, tests pass, docs updated, no unresolved TODOs.

Runs: python main.py

Tests: pytest -q

Lint/type pass (if configured)

No TODOs in changed code

Updated WORKLOG/TASKS

Output format

FILES CHANGED (with full paths)

Final code blocks for each file

RUN & TEST commands

NOTES/ASSUMPTIONS

为什么要在模型甚至无法遵循简单指令的情况下，推动Agentic？ Why the push for Agentic when models can barely follow a simple instruction?

为什么要在模型甚至无法遵循简单指令的情况下，推动Agentic？
Why the push for Agentic when models can barely follow a simple instruction?