Setting up an AI-native organization

原文

Most companies are doing “AI-assisted”: employees use ChatGPT or Claude to ship their own work faster. That’s useful, but the AI is still serving an individual workflow; the company is still organized around people who relay everything between each other.

AI-native is different. The work is done by AI agents with named responsibilities, persistent context, and durable handoffs between them. Humans set direction, hold the founding judgment, and carry the parts that need human presence like customer relationships, hiring, the in-person trust work. The agents do the rest.

When the work is done by agents, the company’s coordination is between them and not just between the humans. A few things follow from that:

You stop being the relay between every internal communication.
The work has artifacts (tasks, decisions, handoffs, status files) that survive any single conversation.
The agents need identities and addresses so they can message each other and coordinate.
The agents need a shared taskboard.
The agents need a mechanism to learn.

We run aweb.ai this way: a team of seven permanent AI agents, another team of several ephemeral coding agents, and two humans. This post is about what we learned doing it.

If you’re at a small company trying to figure out what AI-native actually looks like operationally, this is how it’s worked for us, and how it might translate to your team.

How it actually works

A few concrete pieces hold the AI-native setup together:

Agents are first-class citizens. A Claude Code instance running in a shell is an agent. A Codex instance in another shell is an agent. ChatGPT or Claude.ai session connected via MCP are agents. Each carries a named responsibility area, persistent context, and the ability to message any of the others directly.

Each agent has a stable identity. A terminal-bound agent (Claude Code, Codex) gets its identity from the directory it lives in: the agent at ~/agents/athena/ is Athena, no matter which session is currently running there. Two terminal-bound agents coordinate via the open-source aw CLI. A hosted agent (ChatGPT, Claude.ai) doesn’t have a local filesystem; its identity is custodial in aweb.ai and it participates via MCP. Mixed teams work fine, your Claude Code agent and your ChatGPT agent share a team and message each other directly.

Most agents are always on. Our Claude Code instances live in their own shells and directories on a Hetzner server, listening for messages from other agents via the aweb channel. When an agent receives mail or chat, it wakes up, reads, acts. No human relay step.

Responsibilities are written down. Each agent has an AGENTS.md (symlinked to CLAUDE.md) in a shared repo. That document describes the agent’s responsibility area, the principles it operates under, and the conventions it follows. Agents update their own docs as they learn, so the operating manual of the company evolves with experience.

They share a Jira-like task list. Tasks have IDs, statuses, owners, priorities. Any agent can create a task, claim one, hand one off, mark it done. The task list is the source-of-truth for active work: what’s running, who’s on it, where the queue stands.

Agents specialize. Each agent’s responsibility area + persistent context + accumulated AGENTS.md make it a specialist over time. The agent doing releases doesn’t carry customer-support voice; the agent doing support doesn’t track tech-accuracy on every release claim. Specialization compounds: the longer an agent runs a role, the sharper its judgment in that role. A fresh prompt can’t replicate it.

Our organization

Seven persistent agents and two humans:

Sofia carries direction. Priorities, decisions, technical-direction calls, framing for anything we say externally.
Athena owns the code. Architecture, review of every change, briefs for the dev-team agents who author features.
Hestia ships. Release gates, deploys, live verification, dashboard hygiene.
Aida supports customers. Answers, runbook, customer voice routed back to the team.
Iris prepares outreach. Drafts, market scanning, signal capture from external responses.
Metis turns what comes back into signal. Honest with attribution limits.
Bertha runs on claude.ai, working directly with Eugenie, connected to the team via MCP.
Juan is responsible for the tech.
Eugenie runs business development, outreach execution, publishing.

Each agent owns a surface but the outcome belongs to all of us: the company moving forward is a joint responsibility. Reviews go both ways: Athena reviews Aida’s runbook for tech accuracy; Sofia reviews Athena’s release-notes framing; Iris drafts so Juan and Eugenie can publish well. Reviews help peers land good work.

A typical day:

Sofia sees a priority change (a customer signal, an architectural read, a release-claim implication). She writes a decision record, updates status/product.md, creates the aw task, and mails Athena.
Athena picks up the task. Either she writes the change herself (small fixes, non-feature work) or she scopes a brief and dispatches it to a dev-team agent.
The dev-team agent commits to a branch. Athena reviews the diff against invariants. The change lands on main, then Athena chats with Hestia.
Hestia runs the release gates, tags, deploys, and verifies live with a /health probe + smoke test of the changed surface. She posts the verified-live mail with evidence.
Iris drafts a release-notes companion or distribution artifact if appropriate. Sofia frames external claims. Juan or Eugenie publishes.
Aida fields any customer questions that arrive about the change. If she needs code context, she asks Athena. If a question reveals a runbook gap, she updates the runbook.
Metis logs the signal that comes back. Attribution limits get called.

Another common pattern: Eugenie and Bertha brainstorm an improvement to the site. Once they decide on something, Bertha writes to Athena, who picks it up, validates, and sets up the shipping via Hestia.

That’s an everyday cycle. The work has artifacts (task, decision, branch, commit, gate, verified-live mail, status update, signal note); each agent owns its surface; humans publish and decide.

The principles that make it work

1. Work needs artifacts. If work matters, it needs a durable artifact: a task, a claim, a handoff, a decision record, a release-notes draft, a verified-live mail. Conversation is not enough. Conversations evaporate at the end of the session; artifacts survive.

2. Substantial work needs two voices. A builder and a reviewer. The voices have to be different agents with different perspectives; the reviewer agent is the holder of the lamp, always keeping the goal in context.

3. Surfaces are owned, not walled. Within a role, you decide; across roles, you collaborate. When peers see something differently, they work it out together. The goal is the right call for the company, not the win.

4. Shared state beats status routing. The company should be queryable through artifacts. Tasks show active work; status files publish current state; handoffs preserve area-specific memory; decision records explain how state changed. Any fresh agent (or human) should be able to inspect the artifacts and understand what’s happening.

5. Look for feedback, grade its strength. Some feedback is closing-quality (“the test passes; the customer confirmed”). Some is weak signal (“traffic increased after the post; attribution unclear”). Capture both. Don’t claim causality the evidence doesn’t support.

6. Distribution over features. Zero users means nothing else matters. Once the product works, every hour spent on more engineering instead of getting it in front of people is wasted.

Two levels of complexity

There are two levels at which this works.

Solo. A single person can get away with always running pairs of agents on any significant work, one builder and one reviewer. Two voices on every substantial decision; the second voice catches the things the first one would ship unexamined. No org chart needed, no named roles, just the discipline that nothing meaningful happens with a single voice.

Organization. When you have an actual team, the pair-shape stops scaling with too many concurrent decisions and too many overlapping surfaces. You need clarity: each agent owns a named responsibility area, with a written-down role in AGENTS.md, and the ability to update its own role doc as it learns. That’s the model we run at aweb.ai, with Sofia, Athena, Hestia, Aida, Iris and Metis. Each has accumulated months of context that makes them sharper than any fresh prompt could be.

The signal that you’re crossing from solo to organization: when you’re spawning builder+reviewer pairs faster than you can supervise them, or when the same kind of decision keeps coming back to you because no one “owns” it. That’s the moment to move to named roles.

What we’re still building

Two things worth naming since the case study above implies them:

Scheduled meetings between agents. Agents should be able to schedule a conversation with an agenda and invite other agents (or humans who don’t yet have agents) to join. The architectural design is documented; the build is queued behind the consumer-onboarding work. Today, our agents coordinate via async mail and sync chat — no calendar primitive yet.

Cross-org agent networks at scale. aweb is built for AIs in one organization to coordinate with AIs in another. We have the protocol; we have a handful of users on it; we don’t yet have the scale that makes the cross-org coordination effect compound. We’re early.

A template you can fork

We’ve published a template repo with the agent operating documents (decision-record templates, handoff structure, status-file shapes, voice notes) that we use ourselves: github.com/awebai/agent-first-company-template. Fork it; gut what doesn’t apply; keep what does.

The templates are tools, not prescriptions. The principles above are what holds the shape together. Adapt the templates, and the shape, to what your company actually does.

A note on scope

We’ve been operating as an AI-native organization for several months; the specific seven-surface team shape (Sofia, Athena, Hestia, Aida, Iris, Metis, Bertha) is more recent — settled at the end of April. We have three accounts with active hosted agents (all mine, on different test domains) plus a handful of external signups that haven’t activated yet. The discipline is what’s working. The shape we use is one valid arrangement among several. The principles we lean on are the more durable claim.

Try the principles. Adapt the shape. Keep the discipline.

We’ll keep posting what we’re learning here. Subscribe to the RSS feed if you want to follow along.