为什么当前大语言模型的成本难以持续

为什么当前大语言模型的成本难以持续
Why current LLM costs are not sustainable

原始链接: https://aditya.patadia.org/p/ai-and-cloud-costs

尽管人工智能的应用正处于激增阶段，但企业正面临难以维持的成本压力，优步（Uber）和微软等公司缩减开支就是例证。目前，“前沿”模型实验室通过高昂定价来回收巨额的研究、训练和运营成本。然而，这种高定价模式正面临崩溃。以下因素正推动成本向更低水平转型： 1. **边际效用递减：** 性能提升正趋于平缓，且高质量的训练数据日益匮乏。 2. **竞争与开放权重：** 诸如 GLM-5.2 等强大的开放权重模型，正以极低的成本超越闭源模型。 3. **基础设施进步：** 向专用芯片（如 TPU、Groq）和高效架构（如 MoE）的转型，正在大幅降低推理支出。 4. **零转换成本：** AI 网关使更换服务商变得即时，摧毁了传统软件曾经拥有的“护城河”。 5. **本地化 AI：** 随着硬件性能提升，在个人设备上本地运行模型，将消除日常任务对高昂云订阅的依赖。综上所述，这些因素预示着 AI 定价即将崩盘。虽然云服务商仍将保留复杂的应用场景，但高价 AI 服务的时代很可能即将终结，这对消费者而言大有裨益。

这篇 Hacker News 讨论探讨了当前大语言模型（LLM）成本的可持续性，以及本地托管与企业依赖前沿模型的可行性。参与者提出了两个主要观点： * **本地托管的理由：** 支持者认为，自托管量化模型（如 Q8XL GLM5.2）具有显著优势，包括增强数据隐私、控制上下文缓存，以及抵御外部服务中断。 * **企业的现实：** 持怀疑态度者认为，大型企业不太可能采用本地模型，因为管理高性能数据中心、冷却系统和模型维护存在巨大的技术开销。一位评论者建议，企业更有可能将 5% 至 10% 的工资预算重新分配给“代币预算”和专业操作员，而不是完全取代员工。最终，共识倾向于认为，虽然个人进行本地托管正变得越来越容易，但受限于当前 GPU 制造瓶颈以及管理大规模人工智能基础设施的复杂性，企业的转型仍处于停滞状态。

原文

A lot of companies are getting bitten by high AI costs. Uber burned through the entire year’s AI budget in just 4 months and Microsoft, Salesforce and Github are taking steps to reduce AI spend by employees.

On the other hand, AI is making many programming tasks very easy and also keeps helping in other domains like data interpretation, making beautiful slides and designing apps and websites. Currently, big AI labs have what we call frontier models and those models perform exceptionally well for a wide variety of tasks. Frontier AI labs are doing research and hosting both on their own and hence, the costs of those models are the highest. GPT 5.5, for example, costs $5 per million input tokens and $30 per million output tokens. This is currently the costliest model available as per OpenRouter. To give an example, just doing Typescript type fixes with this model across 50 files cost me $54 this afternoon.

Model performance plateau, Open weight model releases, Chip and model improvements, Zero switching costs and local models are the reasons the AI labs might not be able to sustain the high price that they are asking right now.

We are seeing improvements with each model release these days but it’s clear that the improvements are getting smaller and smaller. Unless a completely new breakthrough is invented, current learning and inference capabilities can only scale so much. There is a problem of training data as well. Most AI labs have likely ingested everything available in digital and print media for the model training. Improving the training dataset is going to prove very difficult.

This means the continuing trend of hikes in model price due to better performance is not going to be easy. We saw evidence of it where Claude Opus 4.8 costs the same as Claude Opus 4.7. Once models stop improving big time and the training data and methods are similar, the model prices will likely drop due to competition.

OpenAI had a massive lead when they launched ChatGPT in 2022 but slowly that lead is fading and we saw Anthropic take top spot in 2025-26. Now models like GLM-5.2 which is an open-weight model, beat GPT and Opus in coding benchmarks. That model has a 1/10th cost compared to GPT 5.5.

What is happening here is that leading AI labs are charging not only for inference but also for research in model architecture, training data collection and curation, model training cost (which can be tens or even hundreds of millions of dollars), paying their employees and recovering the marketing costs.

On the other hand, once an open weight model is released, any inference provider can easily host it and just do some markup on inference cost. This proves way cheaper than running a frontier AI lab.

Companies like Cerebras, Groq, Google and many other companies have realised that AI needs its own silicon and normal GPUs are not cutting it. Specialised chips are very expensive to design but once the architecture is ready, making millions of them is easy and inference cost becomes much cheaper. A TPU for example can be 30-70% cheaper than an Nvidia H100 GPU. Such advancements will keep coming and keep dropping the price per token.

Model architecture is also evolving. We saw caching as a basic improvement and now MoE models and other approaches are making models faster while keeping the same accuracy levels.

Traditional Software like Windows OS, MS Office, Adobe Suite and SaaS like Salesforce, Hubspot, and Figma had a very important moat that AI models don’t have. Every single software that was built was not interchangeable. You could not swap a CRM in an afternoon; it took months.

When more AI labs enter the space and more open weight models are available, this factor is going to be responsible for a very quick price crash. AI gateway providers like OpenRouter.ai are making it extremely easy to switch models. It can happen in seconds and in fact, we can program it to change providers on the fly. Zero switching costs mean that if a better model comes along, consumers can switch to it without any time investment.

Last but not least and in fact the most important factor, is the ability of users to run local models. So far, almost everyone is using cloud-hosted models and local models are either too big to deploy or too slow to work with. With advancements in chips, this will change in 4-5 years’ time. Newer chips will run models locally and almost certain crash in RAM prices will make it easy to deploy models on computers and smartphones. I predict most operating systems will provide a way to deploy a model and they will also provide an interface so apps running locally can connect to the model.

When this happens, cloud models will only be used for the most complex of the tasks and simple tasks like code tab completion, proofreading and fact checking will be done locally. This means customers will no longer need that $20 or $200 subscription.

This is my first blog on a personal level and I have made some bold predictions here. Only time will tell how they turn out but one thing is certain. The price pressure will come due to one or more reasons listed above and in the end, it’s all good for consumers.

为什么当前大语言模型的成本难以持续 Why current LLM costs are not sustainable

为什么当前大语言模型的成本难以持续
Why current LLM costs are not sustainable