在没有框架工程的情况下构建智能体
Building agents without harness engineering

原始链接: https://rajitkhanna.com/agents/

与其从零开始构建定制的智能体架构(这是一项效率低下且容易被竞争对手超越的任务),开发者应将智能体视为一种基础组件。 作者分享了其为 prismvideos.com 构建媒体生成智能体的经验。在意识到 Higgsfield 等竞争对手正利用“Hermes”(一个开源智能体框架)迅速获得观察性记忆、持久化文件系统和自学习等复杂功能后,他们放弃了原有的定制基础设施。通过迁移到 Hermes,他们将重心从“架构工程”转向了集成专有数据、媒体创作工具和特定业务技能等高价值任务。 作者目前正在推出一个 API,允许开发者通过单个请求部署功能完备的 Hermes 智能体。开发者只需提供系统提示词、工具和技能,即可获得一个具备内置记忆、自动化和持久化文件系统的强大生产级智能体。这种方法使企业能够避免厂商锁定,并免去重复造轮子的“苦差事”,毕竟 Claude 和 LangChain 等巨头也开始对这些基础设施进行抽象化处理。通过将智能体与架构解耦,开发者可以专注于产品真正的差异化优势:与客户偏好及专有数据的深度集成。

抱歉。
相关文章

原文

Do not build your own agent. Host Hermes and give it tools, skills, and a system prompt. We're launching an API that makes this process easy.


For prismvideos.com, we shipped a media generation agent built on Vercel AI Agents SDK. Our agent understood which model to recommend to users, could generate images and videos, and could analyze videos and tell users how to recreate them. It was beautiful.

To my horror, days later, Higgsfield, a competitor of ours and a leader in the AI media generation space, launched an agent called Supercomputer. Supercomputer has observational memory (memory across sessions), skills, automations, a computer, and a filesystem. It would have taken us weeks to add all of these features. Supercomputer wasn't built with Vercel AI SDK, Claude Agents SDK, or OpenAI Agents SDK; it is built on Hermes, the open-source personal agent with 185k+ GitHub stars (at this time of writing).

I thought Hermes was a fad for nerds (like myself). But I realized if we used Hermes as a primitive for our agent, we could get session management (per-session memory and compaction), built-in tools (web search, browser, file system navigation), skills, self-learning, and automations for free. Customers could ask our agent, "every week look at our top-performing influencer video from last week and make five variations" - a true magic moment.


We deleted our existing agent, and we launched an EC2 instance with a Hono server. The server created a Hermes agent in a Docker container for every customer. It also acted as a reverse proxy for passing messages between our app and the Hermes gateway. Now, we communicate with every user's Hermes agent over a WebSocket connection.

Rather than building observational memory, skills, self-learning, automations, and a persistent filesystem, we only needed to focus on the engineering relevant to prismvideos.com. We can give the agent our system prompt, our tools for creating media and determining which models to use via MCP, our skills files (how to create UGC videos, storyboards, visual effects), and our connectors (Meta Ads Manager, Google Drive, Resend).


As consumer-facing agents get better - Claude, ChatGPT, Manus - customer expectations rise (for B2B software too). The Claude app has memory, so now my CEO wants it. What about self-learning? Steering? Can we add the Ralph Wiggum loop?

Companies are pouring billions into research and development on agent harnesses. I have no doubt that there will be a new agent harness after Hermes with a new feature everyone wants (it appears the new thing right now is Hermes' built-in learning loop). It is highly unlikely that an AI agent startup becomes wealthy by creating the best harness for a particular use case. If anything, they only expose themselves to the risk that a competitor ships a more feature-complete agent when the next harness arrives. AI agent startups are most likely to create differentiated value by integrating with their customers' proprietary data and learning their preferences.

The agent is the new primitive. Existing agent frameworks require developers to set up:

  1. session management (in some cases)
  2. tools (in some cases)
  3. memory
  4. self-learning
  5. automations
  6. persistent filesystem
  7. container or sandboxed deployment
  8. skills
  9. MCP servers

But one through seven are part of any agent application.

By programmatically creating Hermes instances, developers get the agent and the infrastructure in a single API call:

POST /v1/deployments
Authorization: Bearer $PRISM_API_KEY
Content-Type: application/json
{
  "customer_id": "cus_123",
  "name": "Acme Creative Agent",
  "runtime": "hermes",
  "model": "anthropic/claude-sonnet-4.5",
  "system_prompt": "You are Acme's media generation agent. Help the user plan, create, and iterate on high-performing short-form videos.",
  "sandbox": {
    "enabled": true,
    "type": "docker",
    "persistent_filesystem": true
  },
  "mcp_servers": [
    {
      "name": "prism-media",
      "url": "https://api.prismvideos.com/mcp",
      "tools": [
        "search_models",
        "get_model_schema",
        "get_pricing",
        "generate_image",
        "generate_video",
        "generate_audio"
      ]
    }
  ],
  "skills": [
    {
      "name": "ugc-video-creation",
      "source": "file",
      "path": ".prism/skills/ugc-video-creation/SKILL.md"
    },
    {
      "name": "storyboarding",
      "source": "inline",
      "content": "---\nname: storyboarding\ndescription: Create shot-by-shot storyboards for short-form videos\n---\n# Storyboarding\n..."
    },
    {
      "name": "social-media-visual-effects",
      "source": "url",
      "url": "https://example.com/skills/social-media-visual-effects/SKILL.md"
    }
  ],
  "secrets": {
    "META_ADS_TOKEN": "sec_meta_ads_token",
    "GOOGLE_DRIVE_TOKEN": "sec_google_drive_token"
  },
  "features": {
    "memory": true,
    "dreaming": true,
    "automations": true,
    "steering": true,
    "filesystem_webhooks": true
  }
}

Response:

{
  "deployment_id": "dep_7xK9s2",
  "customer_id": "cus_123",
  "runtime": "hermes",
  "status": "ready",
  "model": "anthropic/claude-sonnet-4.5",
  "thread_id": "thr_default_8a1",
  "filesystem": {
    "workspace_path": "/workspace",
    "persistent": true
  },
  "events": {
    "transport": "sse",
    "url": "https://api.prismagents.com/v1/deployments/dep_123/events"
  }
}

Bring a system prompt, skills, tools, and connectors and get an endpoint to chat with an agent over SSE.


There are a number of schleps creating an agent people use requires. Harness-engineering should not be one of them. This same insight that led us to create our api likely also prompted LangChain to launch Managed Deep Agents and Claude to launch Managed Agents. LangChain Managed Deep Agents is a hosted runtime for deploying AI agents. Developers bring their system prompt, MCP tools, skills, and subagent definitions and get an agent they can chat with. Likewise, Claude Managed Agents gives developers the agent and the infrastructure in a single API call.

LangChain Managed Deep Agents is a powerful abstraction but doesn't expose automations, comes without built-in self-learning, and persistent goals (Ralph Wiggum loop).

Claude Managed Agents has self-learning in research preview, but likewise doesn't expose automations, persistent goals, or accept video inputs via API (a restriction of their models).

The following details cover the difference between our API and their offerings:

Capability Managed Hermes Agents LangChain Managed Deep Agents Claude Managed Agents
No provider lock-in
Session management
Agent + infrastructure in one API call
Observational memory
Built-in tools: web search, browser, file search
Persistent filesystem
Image & video input
Per-container isolation
Credential management
Automations
Subagents
Dreaming
Ralph Wiggum loop
Steering

If you're a developer with a customer-facing chat product, ping me rajit [at] prismvideos [dot] com. We are happy to build your agent for you :).

Thanks to Alex Liu, Land Tantichot, Mom, Dad, Vivek Hazari, Dan Gackle, Daniel DiPietro and Stepan Parunashvili for reading drafts of this post.

联系我们 contact @ memedata.com