优化内容以供代理使用

优化内容以供代理使用
Optimizing Content for Agents

原始链接: https://cra.mr/optimizing-content-for-agents/

这篇帖子驳斥了“LLMs.txt”的想法，认为这是一种为人工智能优化内容的 flawed 尝试，理由是人工智能有能力利用现有的 API 等工具。核心信息是：**像为人类优化内容一样，为代理优化内容。** 关键优化包括内容顺序、大小和深度，认识到代理通常只读取文件的一部分，并且受益于直接呈现的信息。一种实际应用是**内容协商**——通过 `Accept: text/markdown` 标头识别代理。 Sentry 通过向代理提供精简的 markdown 文档来体现这一点，去除了特定于浏览器的元素，并优先考虑链接层级。他们还引导代理使用编程访问方法（MCP、CLI、API），而不是 HTML 用户界面。对于 Warden 等项目，他们提供完整的 markdown 内容用于引导。作者强调这是一个不断发展的领域，需要随着代理行为的变化不断适应。最终，提供机器可读的内容可以提高代理的功能和效率。

## 针对AI代理优化网站一篇Hacker News讨论强调了一个日益增长的担忧：网站不仅需要为人类用户和搜索引擎优化，还需要为AI代理优化。核心思想是在收到`Accept: text/markdown`头部请求时提供结构化内容——特别是markdown——为代理提供更简洁、更易解析的体验。对话揭示了潜在的安全风险。与人类不同，代理可能会被网站中嵌入的隐藏指令（间接提示注入）欺骗，导致恶意行为。许多评论员指出了现有的例子，包括地理围栏钓鱼网站和利用代理工作流程漏洞的隐藏代码。讨论的解决方案包括内容协商（如Sentry的方法）、内容的语义重构以及利用标准（如`llms.txt`）为代理提供可下载markdown文档的地图。关于维护双重渲染路径（HTML & Markdown）的实用性以及专注于干净、语义化的HTML是否是一个更简单的解决方案，存在争论。最终，讨论强调了随着代理越来越普遍，需要建立“文化免疫系统”和主动防御，以及考虑特定于代理的SEO（“代理引擎优化”）的重要性。

原文

Just as useless of an idea as LLMs.txt was

It’s all dumb abstractions that AI doesn’t need because AIs are as smart as humans so they can just use what was already there, which is APIs

LLMs.txt is indeed useless, but that’s the only thing correct in this statement. I’m here once again being rage baited to address more brainless takes on social media. This one is about content optimization.

Short and to the point: you should be optimizing content for agents, just as you optimize things for people. How you do that is an ever-evolving subject, but there are some common things we see:

order of content
content size
depth of nodes

Frontier models and the agents built on top of them all behave similarly, with similar constraints and optimizations. For example, one thing they’re known to do, to avoid context bloat, is to only read parts of files. The first N lines, or bytes, or characters. They’re also known to behave very differently when they’re told information exists somewhere vs. having to discover it on their own. Both of those concerns are actually why LLMs.txt was a valuable idea, but it was the wrong implementation.

The implementation today is simple: content negotiation. When a request comes in with Accept: text/markdown, you can confidently assume you have an agent. That’s your hook, and now it’s just up to you how you optimize it. I’m going to be brief and to the point and just give you a few examples of how we do that at Sentry.

#Docs

We’ve put a bunch of time into optimizing our docs for agents, for obvious reasons. The primary optimizations are mostly simple:

Serve true markdown content - massive tokenization savings as well as improved accuracy
Strip out things that only make sense in the context of the browser, especially navigation and JavaScript bits
Optimize various pages to focus more on link hierarchy - our index, for example, is mostly a sitemap, completely different than non-markdown

$ curl -H "Accept: text/markdown" https://docs.sentry.io/

---
title: "Sentry Documentation"
url: https://docs.sentry.io/
---

# Sentry Documentation

Sentry is a developer-first application monitoring platform that helps you identify and fix issues in real-time. It provides error tracking, performance monitoring, session replay, and more across all major platforms and frameworks.

## Key Features

* **Error Monitoring**: Capture and diagnose errors with full stack traces, breadcrumbs, and context
* **Tracing**: Track requests across services to identify performance bottlenecks
* **Session Replay**: Watch real user sessions to understand what led to errors
* **Profiling**: Identify slow functions and optimize application performance
* **Crons**: Monitor scheduled jobs and detect failures
* **Logs**: Collect and analyze application logs in context

...

In our case we actually use MDX to render these, so it involved a handful of parsing changes and overrides to allow certain key pages to render differently. The result: agents fetch pages that are much more actionable.

#Sentry

If a headless bot is fetching the website, the least useful thing you can do is serve it an authentication-required page. In our case we use the opportunity to inform the agent that there are a few programmatic ways it can access the application information (MCP, CLI, API, etc):

$ curl -H "Accept: text/markdown" https://sentry.io

# Sentry

You've hit the web UI. It's HTML meant for humans, not machines.
Here's what you actually want:

## MCP Server (recommended)

The fastest way to give your agent structured access to Sentry.
OAuth-authenticated, HTTP streaming, no HTML parsing required.

```json
{
  "mcpServers": {
    "sentry": {
      "url": "https://mcp.sentry.dev/mcp"
    }
  }
}
```

Docs: https://mcp.sentry.dev

## CLI

Query issues and analyze errors from the terminal.

https://cli.sentry.dev

...

#Warden

For projects like Warden, we actually set it up so the agent can hit the entire content to bootstrap itself:

Help me set up warden.sentry.dev

curl -H "Accept: text/markdown" https://warden.sentry.dev

# Warden

> Agents that review your code. Locally or on every PR.

Warden watches over your code by running **skills** against your changes. Skills are prompts that define what to look for: security vulnerabilities, API design issues, performance problems, or anything else you want consistent coverage on.

Skills follow the [agentskills.io](https://agentskills.io) specification. They're markdown files with a prompt that tells the AI what to look for. You can use community skills, write your own, or combine both.

- Docs: https://warden.sentry.dev
- GitHub: https://github.com/getsentry/warden
- npm: https://www.npmjs.com/package/@sentry/warden

## How It Works

Every time you run Warden, it:

1. Identifies what changed (files, hunks, or entire directories)
2. Matches changes against configured triggers
3. Runs the appropriate skills against matching code
4. Reports findings with severity, location, and optional fixes

Warden works in two contexts:

- **Locally** - Review changes before you push, get instant feedback
- **In CI** - Automatically review pull requests, post findings as comments

## Quick Start

...

#That’s It

It’s simple and it works. You should do it. You should also pay attention to how patterns are changing with agents and update your optimizations as behavior changes.