人工智能现状:基于OpenRouter的100万亿Token实证研究
State of AI: An Empirical 100T Token Study with OpenRouter

原始链接: https://openrouter.ai/state-of-ai

## LLM 使用:超越炒作 – 基于数据的总结 最新研究揭示了大型语言模型 (LLM) 采用情况比通常认为的更为复杂。该领域并非由单一模型主导,而是蓬勃发展着一个多样化的生态系统,闭源模型(OpenAI、Anthropic)和日益强大的开源模型(DeepSeek、Qwen)共享显著的使用量——有时开源模型甚至占代币的 30% 以上。 令人惊讶的是,LLM 使用的大部分并非以生产力为中心。角色扮演和娱乐占开源模型使用量的 50% 以上,突显了面向消费者的互动应用方面的巨大机会。此外,使用方式正在演变 *从* 简单的提示 *到* 复杂的、多步骤的“代理推理”,模型规划并执行任务。 从地理位置上看,LLM 需求正在迅速扩展到北美以外的地区,亚洲现在占总使用量的 31%。至关重要的是,价格并非采用的唯一驱动因素,用户优先考虑质量、可靠性和能力。最后,*留存*——找到完全适合特定工作负载的模型——正在成为长期成功的关键指标,超越了简单的增长指标。 这些发现强调了对 LLM 开发和部署采取灵活、全球化方法的需求,该方法应基于现实世界的使用模式。

## AI 现状:OpenRouter 100T Token 研究 - 摘要 OpenRouter 最近一项研究,分析了通过其 API 处理的 100 万亿 tokens,提供了关于 AI 使用趋势的见解。主要发现包括角色扮演应用占据主导地位(开源 AI 使用的 52%),这可能归因于较少的内容限制和更高的创造力。然而,数据仅限于 OpenRouter 平台,这意味着自托管小型模型的使用并未被捕捉——可能扭曲了平台小型模型份额的观察到的下降。 讨论强调了对 OpenRouter 数据实践的担忧,用户质疑即使进行匿名处理,分析用户数据的伦理问题。该研究还揭示了单个大型账户对整体 token 量的重大影响,并指出 OpenRouter “Apps” 排行榜中存在激励 token 销毁的可能性。 有趣的是,新加坡在 token 数量上排名第二,可能表明通过 VPN 的中国使用情况。该报告还表明,像 Grok Code 这样的模型因免费访问选项而受欢迎,并强调了像 OpenRouter 这样的平台提供的无缝模型切换的价值,尽管存在用户最终绕过平台直接访问模型的风险。
相关文章

原文

This empirical study offers a data-driven perspective on how LLMs are actually being used, highlighting several themes that nuance the conventional wisdom about AI deployment:

1. A Multi-Model Ecosystem. Our analysis shows that no single model dominates all usage. Instead, we observe a rich multi-model ecosystem with both closed and open models capturing significant shares. For example, even though OpenAI and Anthropic models lead in many programming and knowledge tasks, open source models like DeepSeek and Qwen collectively served a large portion of total tokens (sometimes over 30%). This suggests the future of LLM usage is likely model-agnostic and heterogeneous. For developers, this means maintaining flexibility, integrating multiple models and choosing the best for each job, rather than betting everything on one model's supremacy. For model providers, it underscores that competition can come from unexpected places (e.g., a community model might erode part of your market unless you continuously improve and differentiate).

2. Usage Diversity Beyond Productivity. A surprising finding is the sheer volume of roleplay and entertainment-oriented usage. Over half of open source model usage was for roleplay and storytelling. Even on proprietary platforms, a non-trivial fraction of early ChatGPT use was casual and creative before professional use cases grew. This counters an assumption that LLMs are mostly used for writing code, emails, or summaries. In reality, many users engage with these models for companionship or exploration. This has important implications. It highlights a substantial opportunity for consumer-facing applications that merge narrative design, emotional engagement, and interactivity. It suggests new frontiers for personalization—agents that evolve personalities, remember preferences, or sustain long-form interactions. It also redefines model evaluation metrics: success may depend less on factual accuracy and more on consistency, coherence, and the ability to sustain engaging dialog. Finally, it opens a pathway for crossovers between AI and entertainment IP, with potential in interactive storytelling, gaming, and creator-driven virtual characters.

3. Agents vs Humans: The Rise of Agentic Inference. LLM usage is shifting from single-turn interactions to agentic inference, where models plan, reason, and execute across multiple steps. Rather than producing one-off responses, they now coordinate tool calls, access external data, and iteratively refine outputs to achieve a goal. Early evidence shows rising multi-step queries and chained tool use that we proxy to agentic use. As this paradigm expands, evaluation will move from language quality to task completion and efficiency. The next competitive frontier is how effectively models can perform sustained reasoning, a shift that may ultimately redefine what agentic inference at scale means in practice.

4. Geographic Outlook. LLM usage is becoming increasingly global and decentralized, with rapid growth beyond North America. Asia's share of total token demand has risen from about 13% to 31%, reflecting stronger enterprise adoption and innovation. Meanwhile, China has emerged as a major force, not only through domestic consumption but also by producing globally competitive models. The broader takeaway: LLMs must be globally useful performing well across languages, contexts, and markets. The next phase of competition will hinge on cultural adaptability and multilingual capability, not just model scale.

5. Cost vs. Usage Dynamics. The LLM market does not seem to behave like a commodity just yet: price alone explains little about usage. Users balance cost with reasoning quality, reliability, and breadth of capability. Closed models continue to capture high-value, revenue-linked workloads, while open models dominate lower-cost and high-volume tasks. This creates a dynamic equilibrium—one defined less by stability and more by constant pressure from below. Open source models continuously push the efficient frontier, especially in reasoning and coding domains (e.g. ) where rapid iteration and OSS innovations narrow the performance gap. Each improvement in open models compresses the pricing power of proprietary systems, forcing them to justify premiums through superior integration, consistency, and enterprise support. The resulting competition is fast-moving, asymmetric, and continuously shifting. Over time, as quality convergence accelerates, price elasticity is likely to increase, turning what was once a differentiated market into a more fluid one.

6. Retention and the Cinderella Glass Slipper Phenomenon. As foundation models advance in leaps, not steps, retention has become the true measure of defensibility. Each breakthrough creates a fleeting launch window where a model can "fit" a high-value workload perfectly (the Cinderella Glass Slipper moment) and once users find that fit, they stay. In this paradigm, product-market fit equals workload-model fit: being the first to solve a real pain point drives deep, sticky adoption as users build workflows and habits around that capability. Switching then becomes costly, both technically and behaviorally. For builders and investors, the signal to watch isn't growth but retention curves, namely, the formation of foundational cohorts who stay through model updates. In an increasingly fast-moving market, capturing these important unmet needs early determines who endures after the next capability leap.

Together, LLMs are becoming an essential computational substrate for reasoning-like tasks across domains, from programming to creative writing. As models continue to advance and deployment expands, having accurate insights on real-world usage dynamics will be crucial for making informed decisions. Ways in which people use LLMs do not always align with expectations and vary significantly country by country, state by state, use case by use case. By observing usage at scale, we can ground our understanding of LLM impact in reality, ensuring that subsequent developments, be they technical improvements, product features, or regulations, are aligned with actual usage patterns and needs. We hope this work serves as a foundation for more empirical studies and that it encourages the AI community to continuously measure and learn from real-world usage as we build the next generation of frontier models.

联系我们 contact @ memedata.com