Parity
Imagine a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks: "Create a note summarizing my meeting and tag it as urgent." If the UI can do it but the agent can't, the agent is stuck.
The fix: Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do. This isn't about a one-to-one mapping of UI buttons to tools—it's about achieving the same outcomes.
The discipline: When adding any UI capability, ask: Can the agent achieve this outcome? If not, add the necessary tools or primitives.
A capability map helps:
| User Action | How Agent Achieves It |
|---|---|
| Create a note | write_file to notes directory, or create_note tool |
| Tag a note as urgent | update_file metadata, or tag_note tool |
| Search notes | search_files or search_notes tool |
| Delete a note | delete_file or delete_note tool |
The test: Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?
Granularity
The key shift: The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It can encounter unexpected cases, adjust its approach, or ask clarifying questions—the loop continues until the outcome is achieved.
The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.
Composability
This works for developers and users. You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own.
The constraint: this only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, composition breaks down.
Emergent Capability
Example: "Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled." You didn't build a commitment tracker, but if the agent can read notes and tasks, it can accomplish this.
This reveals latent demand. Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.
This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.
Improvement over time
Accumulated context: The agent maintains state across sessions—what exists, what the user has done, and what worked.
Prompt refinement at multiple levels: developer-level updates, user-level customization, and (advanced) agent-level adjustments based on feedback.
Self-modification (advanced): Agents that edit their own prompts or code require safety rails—approval gates, checkpoints, rollback paths, and health checks.
The mechanisms are still being discovered. Context and prompt refinement are proven; self-modification is emerging.