原文
Section 04
Model benchmarks for GPT-5.1, Claude Sonnet 4.5, GPT-5-Codex, Claude Opus 4.5, and Gemini 3 Pro to understand how they behave as backends for coding agents across latency, throughput, rate limits, cold starts, cost, and tokenization efficiency.