展示HN：Kelet – 用于您的LLM应用程序的根本原因分析代理

展示HN：Kelet – 用于您的LLM应用程序的根本原因分析代理
Show HN: Kelet – Root Cause Analysis agent for your LLM apps

## Kelet：AI 智能体自动化根本原因分析 Kelet 是一项服务，旨在自动诊断和修复 AI 智能体和 LLM 应用（例如使用 LangChain、LlamaIndex 等构建的应用）中的故障。与传统的可观察性工具*展示*数据不同，Kelet *分析*数据，从智能体追踪和“信号”（例如用户编辑或负面反馈）中识别问题的根本原因。集成快速——只需几行代码——Kelet 在其自己的服务器上运行，安全地处理数百万条追踪（SOC 2 认证）。它的工作方式就像一个侦探，通过关联会话中的模式来确定错误的根源并建议有针对性的修复，即使只有 200 个会话也能做到。 Kelet 非常适合构建自己智能体的开发者，而不是仅仅*使用*他人创建的 AI 工具。它的按使用量定价包括分析所需的 LLM token，并且其团队在构建可扩展、可靠的基础设施方面拥有丰富的经验（包括对 Kubernetes 的贡献）。本质上，Kelet 自动化了调试过程，节省了工程师的时间并加速了 AI 应用的改进。

## Kelet：LLM 智能体自动化根本原因分析 Kelet (kelet.ai) 是一款旨在简化 AI 智能体调试的新工具。开发者在构建了大量生产级智能体后发现，确定智能体*失败的原因*比构建它们本身更具挑战性——因为失败表现为不正确的答案，而不是崩溃。 Kelet 通过连接到智能体追踪和信号（用户反馈、点击等），自动进行调查，提取事实，形成失败假设，并至关重要的是，**聚类**跨会话的假设以揭示模式。这将根本原因以及建议的修复方案（最初是“提示补丁”，未来计划解决超出提示的问题）呈现出来。集成通过“Kelet Skill”实现，该 Skill 扫描代码并设置数据收集，或者通过 Python/TypeScript SDK 实现。目前处于 Beta 阶段且免费使用，开发者寻求来自在生产环境中运行智能体的用户的反馈，以验证该方法。它旨在与现有的遥测平台（如 Langchain/Braintrust）区分开来。

原文

What does Kelet actually do?

Kelet reads your production AI agent traces and signals, clusters failure patterns across thousands of sessions, and surfaces root causes with evidence — so you ship fixes instead of hypotheses. Think of it as a detective that investigates every failure automatically.

What kinds of AI agents and LLM applications does Kelet work with?

Any agent or LLM application where you own the code — agentic loops, multi-step workflows, RAG pipelines, chatbots, autonomous agents. If you built it and you ship it, Kelet can help you improve it. That includes agents built with LangChain, LangGraph, PydanticAI, Mastra, CrewAI, AutoGen, LlamaIndex, Haystack, Semantic Kernel, or directly on the OpenAI, Anthropic, or Gemini APIs. Two situations where Kelet is not the right fit: If you use AI tools built by others (Cursor, Claude Code, Copilot as a developer), you're a user, not a builder — Kelet isn't designed for your use case. Similarly, if you're building a skill or plugin inside an existing agentic platform, you're extending infrastructure you don't control, and Kelet can't instrument that. But if you're building your own agent using any LLM SDK or framework — you own that agent, and Kelet is exactly for you.

How long does integration take?

Five minutes. Install via the Kelet installer skill — or `pip install kelet` / `npm install kelet` if you prefer to do it manually — add two lines to your agent code, and traces start flowing. Kelet is fully OpenTelemetry-compliant — any OTEL-instrumented agent works out of the box, no infrastructure changes needed.

Where does Kelet actually run?

On Kelet's servers. Once you install Kelet — via the SDK or the installer skill — traces and signals start flowing to our infrastructure automatically. It's SOC 2 certified and runs 24/7, continuously ingesting your traces, finding failure patterns, building hypotheses, and proposing targeted fixes. The LLM tokens powering that analysis don't touch your model API bill — Kelet covers them. You pay Kelet based on usage. See kelet.ai/pricing.

Is Kelet a skill or a service?

A service. Kelet is an agent that runs on Kelet's servers around the clock — not a plugin you invoke, not something you run manually. The installer skill is just how you connect it. Once connected, Kelet works continuously: reading your traces, clustering failure patterns across thousands of sessions, building root cause hypotheses, and proposing targeted fixes. You don't run it. It runs for you.

What are "signals" and why do they matter?

Signals are probabilistic hints that something went wrong in a session: a thumbs-down rating, a user editing AI output, an abandoned conversation, or a synthetic LLM-as-judge check you configure. They tell Kelet where to look in your traces — not verdicts, but clues that guide the investigation.

How is Kelet different from Langfuse, Arize, Logfire, or other observability tools?

Those tools show you traces. Kelet reads them for you. Observability platforms are thermometers — they report symptoms. Kelet is the doctor that diagnoses root causes and generates targeted prompt patches. You no longer need to scroll thousands of traces manually.

How does Kelet actually find root causes?

Kelet works like a detective. Every session leaves a trail — LLM calls, tool invocations, retrieval steps, every agent hop. Kelet uses signals as clues: a thumbs-down, an edited AI response, an abandoned conversation, a synthetic LLM-judge flag. It follows each thread through your traces, cross-references patterns across thousands of sessions, and builds a root cause hypothesis backed by evidence. Same process a senior engineer would run manually — automated, at scale, on every failure at once.

Do I need a lot of traffic to get value?

No. Teams typically see their first real failure patterns with as few as 200+ sessions and 3+ signals configured. Not sure which signals to set up? Kelet's AI walks you through it — no guesswork, no manual configuration. And if you're starting from zero, synthetic signal presets (LLM-as-judge evaluators) generate signal from day one, before real user feedback accumulates.

Does Kelet handle multi-agent architectures?

Yes. Kelet handles multi-agent sessions natively. Credit assignment identifies exactly which agent in a chain caused a failure — so you know what to fix, not just that something is broken.

Is Kelet built to scale?

Yes — Kelet was architected for production scale from day one. The team behind Kelet includes ex-Kubernetes maintainers and cloud-native infrastructure veterans with 15+ years of open-source systems work. Kelet handles millions of traces, concurrent agent fleets, and high-volume production workloads. We have built infrastructure at this scale before — Kelet is built on the same foundations.

What does it cost?

Free to start, no credit card required. Connect your first agent in 5 minutes. Usage-based pricing scales with volume for teams that need more. See kelet.ai/pricing for details.

Is my data secure?

Yes. Kelet is SOC 2 certified. All data is isolated at the database level per organization — strict row-level security, no cross-org data access, ever.

Will Kelet use my data to train AI models?

Never. We don't share your data or use it to train public models. What we do: Kelet automatically fine-tunes a private set of models for each sub-agent you connect — roughly a dozen per agent. They live in your account, trained on your traces, serving only your root-cause analysis. They're never shared. Frankly, they wouldn't be useful to anyone else anyway — they're calibrated to your specific agent, not anyone else's.

Who built Kelet?

Kelet was built by a team obsessed with production AI reliability. We come from cloud-native infrastructure, Kubernetes core contributions, and LLM systems — engineers who have spent careers building and operating critical distributed systems, and building the tools others rely on to do the same. We built Kelet because we felt the pain ourselves: thousands of traces, no root cause, no fix. So we built the tool we wished existed.

Can I trust Kelet with my production system?

Our team has spent years maintaining critical infrastructure used by thousands of engineers worldwide — including core contributions to Kubernetes and cloud-native tooling. Kelet is SOC 2 certified and designed to be a passive observer: read-only access to your traces, no changes to your system, no risk to uptime.

展示HN：Kelet – 用于您的LLM应用程序的根本原因分析代理 Show HN: Kelet – Root Cause Analysis agent for your LLM apps

展示HN：Kelet – 用于您的LLM应用程序的根本原因分析代理
Show HN: Kelet – Root Cause Analysis agent for your LLM apps