软件 3.1？ – AI 功能

软件 3.1？ – AI 功能
Software 3.1? – AI Functions

原始链接: https://blog.mikegchambers.com/posts/software-31-ai-functions/

## 软件 3.1：AI 函数与运行时验证 Andrej Karpathy 将软件演进分为几个阶段：1.0 是人工编写的代码，2.0 是学习到的神经网络权重，3.0 是提示大型语言模型。虽然 3.0（如 ChatGPT 和 Cursor）被广泛使用——提示、生成，然后由*人类*验证——一种新的方法，**AI 函数**，提出了转向**运行时验证**。目前，大型语言模型生成文本代码，人类在部署*之前*集成和测试。基于 Strands Agents SDK 的 AI 函数旨在让大型语言模型生成代码，该代码*直接*在你的应用程序中运行，返回原生 Python 对象，而不仅仅是字符串。至关重要的是，**后置条件**——Python 断言甚至由大型语言模型驱动的检查——将在*每次调用*时验证输出，并在验证失败时自动重试。这使得 AI 的参与从开发时期的辅助转变为运行时执行，信任从一次性的人工审查转变为持续的自动化验证。AI 函数使用 `@ai_function` 装饰器，允许在这些后置条件旁边使用自然语言规范。这使得诸如结构化输出（使用 Pydantic）、多智能体组合和异步工作流等功能成为可能。本质上，它是“软件 3.1”——对 3.0 范式的改进，其中大型语言模型*指定、生成和执行*，而*机器*在运行时*验证*。该项目是开源的，并鼓励实验，以探索这种新的 AI 辅助开发方法所带来的可能性。

最近一篇Hacker News上的帖子讨论了AWS（通过strands-labs）推出的一个名为“AI Functions”的新实验性库，它可以在运行时直接在Python应用程序中执行LLM生成的代码。其核心思想是利用大型语言模型为每次函数调用“动态”创建代码，并通过自动后置条件进行验证。然而，讨论中大多持怀疑态度。许多评论者指出，类似的概念在2-3年前就已被探索过，但由于可维护性问题和高计算成本而被认为不切实际。担忧集中在使用自然语言（英语）作为精确计算基础的固有歧义性，以及重复生成代码而不是缓存代码的问题。尽管存在批评，一些人建议它可能在特定领域有潜在用途，例如游戏开发中的程序化生成，或自动化简单的任务，如API连接。另一些人则认为这是一种无实际问题的解决方案，由投资者兴趣而非实际软件工程驱动。

原文

Andrej Karpathy has a version numbering scheme for how software gets written. Software 1.0 is code written by humans. Software 2.0 is neural network weights learned through optimization. Software 3.0 is prompting LLMs in plain language, and sounds nicer than calling it vibe coding, which, fun-fact is a also a Karpathy coined term.

Of course, Software 3.0 is real. Millions of people are using it daily. Tools like Kiro, Cursor, Claude Code, and ChatGPT let you describe what you want and get code back. Karpathy emphasizes a ‘generation–verification loop’ in partial-autonomy tools: the model generates changes, a human verifies them, and the work iterates.

But there’s something more fundamental going on than who reviews what. Look at what the LLM actually produces in Software 3.0: text. Code as strings. JSON payloads. Markdown documents. The model generates, you receive text, and then you do everything else – integrate it into your codebase, write tests, run CI, deploy. If you’re disciplined about verification, you write test cases, but those run before deployment. Once the code ships, the tests don’t execute again. The LLM’s involvement ends when it hands you the output. Your running software has no relationship with the model that helped write it.

Now consider a different arrangement. The LLM generates code that actually runs inside your application – at call time, every time the function is invoked. It returns native Python objects – DataFrames, Pydantic models, database connections – not JSON strings you have to parse. And verification isn’t a gate you pass before deployment; it’s post-conditions that execute on every call, feeding failures back to the model for automatic retries. This changes three things at once: where AI fits in your software (runtime, not just development time), what it produces (live objects you can call methods on, not serialized text), and how you trust it (continuous automated verification, not one-time human review).

That’s the experiment at the heart of AI Functions, a new project from Strands Labs built on the Strands Agents SDK. You write a Python function with a natural language specification instead of implementation code. You attach post-conditions – plain Python assertions that define what correct output looks like. When the function is called, the LLM generates code, executes it in your Python process, returns the result as a native object, and the post-conditions verify it. If verification fails, the system retries with the error as feedback. The human never inspects the generated code. The post-conditions do the inspecting – every time.

If Software 3.0 is “human prompts, LLM generates, human verifies,” then I propose that AI Functions are Software 3.1: human specifies, LLM generates and executes, machine verifies – at runtime. Same paradigm – natural language as the programming interface. But the execution model is different. The LLM isn’t producing text for a human to integrate. It’s producing code that runs, returning objects your application uses directly, verified by post-conditions on every call. Software 3.1 is a “point release,” not a major version bump. The upgrade is in what happens after generation.

This post is a deep dive into what AI Functions are, how they work, and what automated verification makes possible.

What AI Functions Are

from ai_functions import ai_function

@ai_function
def translate_text(text: str, lang: str) -> str:
    """
    Translate the text below to the following language: {lang}.
    {text}
    """

result = translate_text("The quarterly results exceeded expectations.", lang="French")

from ai_functions import ai_function
from pydantic import BaseModel

class MeetingSummary(BaseModel):
    attendees: list[str]
    key_decisions: list[str]
    action_items: list[str]

@ai_function
def summarize_meeting(transcript: str) -> MeetingSummary:
    """
    Summarize the following meeting transcript in less than 50 words.
    <transcript>
    {transcript}
    </transcript>
    """

from ai_functions import ai_function, PostConditionResult
from pydantic import BaseModel

class MeetingSummary(BaseModel):
    attendees: list[str]
    key_decisions: list[str]
    action_items: list[str]

def check_length(response: MeetingSummary):
    total = sum(len(d.split()) for d in response.key_decisions)
    assert total <= 50, f"Key decisions should total under 50 words, got {total}"

@ai_function
def check_quality(response: MeetingSummary) -> PostConditionResult:
    """
    Check if the meeting summary below satisfies the following criteria:
    - Key decisions must be specific and actionable, not vague
    - Action items must each name a responsible person
    <decisions>{response.key_decisions}</decisions>
    <actions>{response.action_items}</actions>
    """

@ai_function(post_conditions=[check_length, check_quality])
def summarize_meeting(transcript: str) -> MeetingSummary:
    """
    Summarize the following meeting transcript in less than 50 words.
    <transcript>
    {transcript}
    </transcript>
    """

from ai_functions import ai_function
from pandas import DataFrame, api

def check_invoice_dataframe(df: DataFrame):
    """Post-condition: validate DataFrame structure."""
    assert {'product_name', 'quantity', 'price', 'purchase_date'}.issubset(df.columns)
    assert api.types.is_integer_dtype(df['quantity']), "quantity must be an integer"
    assert api.types.is_float_dtype(df['price']), "price must be a float"
    assert api.types.is_datetime64_any_dtype(df['purchase_date'])
    assert not df.duplicated(subset=['product_name', 'purchase_date']).any()

@ai_function(
    code_execution_mode="local",
    code_executor_additional_imports=["pandas.*", "sqlite3", "json"],
    post_conditions=[check_invoice_dataframe],
)
def import_invoice(path: str) -> DataFrame:
    """
    The file `{path}` contains purchase logs. Extract them in a DataFrame with columns:
    - product_name (str)
    - quantity (int)
    - price (float)
    - purchase_date (datetime)
    """

from ai_functions import ai_function
from pandas import DataFrame

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*"])
async def analyze_sales_data(path: str) -> DataFrame:
    """
    Load the sales data from `{path}` and compute a summary DataFrame
    with total revenue, average order value, and top 5 products by volume.
    """

@ai_function
def write_executive_summary(company: str, financials: DataFrame) -> str:
    """
    Write a concise executive summary for {company} highlighting key trends
    and recommendations based on the provided financial data.
    """

financials = await analyze_sales_data("data/q4_sales.csv")
summary = write_executive_summary("Acme Corp", financials)
print("Top Products:", financials.head())
print("Summary:", summary)

from ai_functions import ai_function
from ai_functions.types import PostConditionResult
from pydantic import BaseModel, Field
from typing import Literal

@ai_function(
    description="Search the web for a topic and return a cited summary",
    tools=[websearch_tool],
    post_conditions=[check_length, check_citations],
)
def search_agent(query: str, max_words: int = 500) -> str:
    """
    Perform a web search on the following topic and return a summary.
    Every claim must be supported by citations to sources.
    <query>{query}</query>
    """

@ai_function(
    description="Suggest the plan and organization of a report",
    tools=[websearch_tool],
)
def report_planner(topic: str) -> ReportPlan:
    """Generate a plan to write a report on: {topic}"""

@ai_function(tools=[report_planner, search_agent, report.add_section])
def report_orchestrator(topic: str) -> Literal["done"]:
    """
    Write a report on the following topic: {topic}
    """

from ai_functions import ai_function
import asyncio
import pandas as pd

@ai_function(tools=[websearch_tool])
async def research_market(company: str) -> str:
    """Research and summarize the competitive landscape and recent news for: {company}"""

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*", "yfinance.*"])
async def load_financial_data(stock: str) -> pd.DataFrame:
    """
    Use the `yfinance` Python package to retrieve the historical prices of {stock}
    in the last 30 days. Return a DataFrame with columns [date, price].
    """

@ai_function(code_execution_mode="local", code_executor_additional_imports=["pandas.*", "plotly.*"])
def write_investment_memo(company: str, research: str, financials: pd.DataFrame) -> str:
    """
    Write an investment memo for {company}. Use the market research and financial data:
    {research}
    """

async def due_diligence_workflow(company: str):
    research, financials = await asyncio.gather(
        research_market(company),
        load_financial_data(company)
    )
    write_investment_memo(company, research, financials)

from ai_functions import ai_function, AIFunctionConfig
from pandas import DataFrame

class Configs:
    BIG_MODEL = AIFunctionConfig(model="us.anthropic.claude-sonnet-4-5-20250929-v1:0")
    FAST_MODEL = AIFunctionConfig(model="us.anthropic.claude-haiku-4-5-20251001-v1:0")
    DATA_ANALYSIS = AIFunctionConfig(
        model="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
        code_execution_mode="local",
        code_executor_additional_imports=["pandas.*", "numpy.*"],
    )

@ai_function(config=Configs.DATA_ANALYSIS)
def normalize_dataset(path: str) -> DataFrame:
    """Load, clean, and normalize the dataset at `{path}` into a standard schema."""

@ai_function(config=Configs.FAST_MODEL)
def validate_email(text: str) -> bool:
    """Check if the following string is a valid email address: {text}"""

from ai_functions import ai_function, PostConditionResult

@ai_function
def check_citations(summary: str) -> PostConditionResult:
    """
    Validate if all the claims made in the following summary are supported
    by an inline citation to a credible source.
    <summary>
    {summary}
    </summary>
    """

def check_length(summary: str, max_words: int):
    assert len(summary.split()) <= max_words

@ai_function(
    tools=[websearch_tool],
    post_conditions=[check_length, check_citations],
)
def market_researcher(query: str, max_words: int = 500) -> str:
    """
    Research and provide a well-sourced answer to: {query}
    Every claim must be supported by citations to credible sources.
    """

from ai_functions import ai_function
from pydantic import BaseModel
from typing import Any, Literal
import pytest, io
from contextlib import redirect_stderr, redirect_stdout

class FeatureRequest(BaseModel):
    description: str
    test_files: list[str]

# Post-conditions can request original input arguments by name.
# Here, `feature` matches the parameter name of `implement_feature`.
def run_tests(_answer: Any, feature: FeatureRequest):
    stdio_capture = io.StringIO()
    with redirect_stdout(stdio_capture), redirect_stderr(stdio_capture):
        retcode = pytest.main(feature.test_files)
    if retcode:
        raise RuntimeError(stdio_capture.getvalue())

@ai_function(post_conditions=[run_tests])
def implement_feature(feature: FeatureRequest) -> Literal["done"]:
    """
    Implement the following feature in the current code base:
    <feature>{feature.description}</feature>
    Once done the code base should pass the following tests: {feature.test_files}
    """

def run_workflow(features: list[FeatureRequest]):
    for feature in features:
        implement_feature(feature)

软件 3.1？ – AI 功能 Software 3.1? – AI Functions

What AI Functions Are

Structured Output with Pydantic

Post-Conditions

Returning Native Python Objects

Code Execution and the Trust Model

Multi-Agent Composition

Async Execution and Parallel Workflows

Configuration Sharing

Validating More Than Output

Test Suites as Post-Conditions

Try It

软件 3.1？ – AI 功能
Software 3.1? – AI Functions