克普勒如何利用Claude构建可验证的人工智能，用于金融服务。

克普勒如何利用Claude构建可验证的人工智能，用于金融服务。
How Kepler built verifiable AI for financial services with Claude

原始链接: https://claude.com/blog/how-kepler-built-verifiable-ai-for-financial-services-with-claude

## Kepler：构建金融领域值得信赖的AI Kepler Finance成立于2025年，旨在解决金融行业的一个关键需求：值得信赖的AI。Kepler认识到金融公司需要可审计和可验证的结果，因此构建了一个利用Claude的平台，能够为复杂的金融问题提供即时可验证的答案。他们发现现有的AI工具生成的输出缺乏透明度，引发了对准确性和可审计性的担忧。 Kepler的解决方案是为AI提供一个“信任和验证层”，将Claude的推理能力与确定性基础设施相结合。这使得分析师可以用简单的英语提问，并获得可追溯到原始文件的结果——他们已经索引了超过2600万份SEC文件和5000万份公共文件。他们成功的关键在于战略性地利用不同的Claude模型——Opus用于复杂推理，Sonnet用于高吞吐量任务——以及严格的测试和对溯源的关注。Kepler的架构将解释（Claude）与执行（确定性系统）分离，从而实现快速扩展和模块化改进。 Kepler最初在要求严格的金融领域起步，旨在将其平台扩展到其他需要从大量文档集中获取可验证答案的行业，例如医疗保健和法律。他们强调对Claude进行清晰的任务定义、强大的评估流程，以及从一开始就将可审计性融入系统。

对不起。

原文

In our series, How startups build with Claude, we highlight how startups are transforming their industries with AI. In this article, we share how Kepler built a trust and verification layer for AI in financial services.

The quick pitch
Name	Kepler
Founded	2025
Founders	Vinoo Ganesh (CEO) and John McRaven (CTO)
Stack	AWS, Rust, Python, containers for orchestration
Growth	Indexed 26M+ SEC filings, 50M+ public documents, 1M+ private documents, and 14,000+ companies across 27 global markets in less than three months.

Financial firms operate in a heavily regulated environment where reporting has to be auditable and accountable. Every figure in a regulatory filing, deal pitch, or research report needs to be verifiable against source documents.

The tools the financial industry has traditionally relied on can pull data, but they still require analysts for that verification process. An analytics system can’t interpret a freeform question, decompose it into steps, or work out that a single metric requires pulling three different line items across specific fiscal periods. AI systems can do that interpretation, but they handle it in the same step as the computation, so the numbers they produce are generated by the model, which can make mistakes.

Vinoo Ganesh and John McRaven spent years at Palantir building data systems for defense, energy, and financial firms. That work shaped how they think about trust in environments where answers must be verifiable. Before founding Kepler, they spoke with 147 financial firms, including private equity, hedge funds, and investment banks, and heard the same thing at nearly all of them: everyone wanted to use AI for research, but nobody trusted the output. As one managing director told them, "How am I supposed to trust something I can't audit?"

The duo’s answer was to build deterministic infrastructure that serves as a trust and verification layer for AI. That infrastructure, together with Claude as the reasoning and interpretation layer, powers Kepler Finance: a research platform for financial services used by analysts to ask questions in plain English and receive instantly verifiable answers.

Handling long, multi-step tasks and flagging ambiguity

Financial analysis involves complex, multi-step calculations, dense data, and overloaded terminology, and has no tolerance for error. With that in mind, Kepler needed a model that could hold a long plan together without drift and flag ambiguity.

For example, if an analyst asks for a company’s inventory days outstanding over the last eight quarters, the model needs to figure out what the answer needs: the right formula, correct fiscal periods, and any restatements that might affect the numbers.

The team benchmarked across all frontier models. They found that on straightforward queries, models performed comparably. But when it came to long, multi-step plans with interdependencies, all but Claude started taking shortcuts or losing track of constraints by the fourth or fifth step. "On our workloads, Claude was the model that consistently held the plan together," Ganesh says. “Other models would start strong and then quietly drop a constraint by step five.”

The clearest difference was how each model handled uncertainty and kept humans in the loop. For example, in situations where one term can have two different meanings, most models picked one meaning and kept going. Claude stopped and asked the analyst to decide. "That behavior matters more than any benchmark score," Ganesh says. "One wrong assumption early in a financial analysis breaks everything downstream."

Engineering the context around Claude

The Kepler team found that Claude produced better results when given precisely defined tasks enhanced with structured domain knowledge, definitions, and hard boundaries on what to resolve versus what to escalate. "In finance, the model can’t be the whole system. We treat it as one stage in a pipeline whose job is to hand the model exactly what it needs to succeed at exactly that stage," says McRaven. “Prompt engineering optimizes a call while content engineering optimizes the system around it.”

The team built deterministic execution environments that Claude can invoke for every operation that needs to be provably correct, such as computing a ratio or resolving a fiscal period. They developed a proprietary ontology that maps financial concepts to precise definitions and formulas, customizable on a per-use basis. Security and access control restrictions are enforced at every step, governing which data sources each user can pull from. On top of this, they built recurring, customizable skills for the most common workflows in their pipeline, such as enterprise value calculations across complex capital structures (e.g. handling preferred shares, convertibles, and minority interests) and segment revenue waterfall reconciliation across reporting period changes. These skills coordinate between deterministic and nondeterministic stages and are idempotent by design: the same input will always generate the same output.

Next, they decomposed their workflows into a multi-stage pipeline, matching different Claude models to different stages: Opus 4.7 for complex reasoning like decomposing intent, resolving ambiguity, and producing structured execution plans, and Sonnet 4.6 for higher-throughput stages where tasks are more constrained. They also trained their own specialized models for recall (some use Claude as the foundation, some are proprietary to Kepler), scoring 94% accuracy on tasks like mapping financial statement labels to standardized taxonomy codes, compared with the 38-46% accuracy achieved by other models.

The team tests every prompt change, model upgrade, and context modification against thousands of cases before going to production. They’ve built automated evaluation pipelines that compare Claude's output against known-correct answers at every stage, checking both the structured plan and the final computed result. When a test fails, they can trace whether the issue was in Claude's reasoning, the context provided, or the downstream execution. When Anthropic ships a new model version, Kepler benchmarks it within hours and knows exactly which stages improve, which regress, and which need prompt adjustments.

Scaling with Claude

Kepler Finance has indexed more than 26 million SEC filings across 14,000+ companies, 50M+ public documents, and 1M+ private documents spanning 27 global markets. Claude makes that volume of unstructured data usable, interpreting questions against the entire corpus and reconciling differences in terminology across companies and time periods. Kepler's retrieval layer then pulls figures from verified SEC filings, computes the result, and assembles the results into the desk's Excel template, where with a single click analysts can trace each number back to its exact line item highlighted in the source document.

The separation between Claude's reasoning and Kepler's deterministic infrastructure lets a small team build at this scale. Claude handles the interpretation layer that would otherwise require many domain-specific NLP engineers and Kepler's infrastructure handles the rest. New capabilities that would take a large team months to ship can be built in weeks because the architecture is modular: the team improves the reasoning at one stage without touching the rest of the pipeline.

As financial institutions require compliance infrastructure before they engage, Kepler has built full audit logging, siloed customer environments, and end-to-end provenance from the start, and has SOC 2 Type II certification, with ISO 27001 certification underway.

Kepler’s platform is domain-agnostic by design. The team started in finance deliberately as it’s one of the most demanding environments for AI, with dense data, overloaded terminology, complex calculations, and zero tolerance for error. The architecture built to survive that scrutiny applies wherever professionals need verifiable answers from large document collections. From healthcare providers reconciling clinical trial data against treatment protocols to legal teams tracing precedent across decades of case law, the pattern is the same: Claude reasons about the question and infrastructure guarantees the answer.

"Kepler Finance is our first product," says Ganesh. "It won’t be the last."

Best practices from the Kepler team
Give Claude the right job	Retrieval is a job for a query engine. Computation is a job for a formula engine. Ask Claude to interpret, decompose, or reason.
Match models to stages	Use Opus for complex reasoning and Sonnet for constrained, high-throughput tasks. Running everything on one model leaves either quality or cost on the table.
Invest in evaluation before prompts	Build automated pipelines that test Claude's output against known-correct answers at every stage. Test each stage independently and the full pipeline end-to-end. In finance, a silent regression is how you lose a client permanently.
Build for provenance from day one	Professionals are trained to verify everything. Provenance has to shape the entire system, not get added at the end.

Build your startup on the Claude Platform.