OpenAI太贵了，无法击败。

OpenAI太贵了，无法击败。
OpenAI is too cheap to beat

原始链接: https://generatingconversation.substack.com/p/openai-is-too-cheap-to-beat

在近期，数据在创建如谷歌和Facebook等著名公司中扮演了重要角色，这些公司后来由于广泛的使用而崛起。类似地，人工智能语言学习模型（LLM）已经成为有价值的资产，通过聊天机器人、虚拟助手和在线通信系统提供增强的见解。随着模型的新迭代，OpenAI这家著名的AI公司通过客户支持聊天、电子邮件和社会媒体互动等互动获取大量数据洞察，从而实现未来的模型改进。通过投资更好的模型和基础设施，OpenAI的低成本服务接近免费，尽管提供了相同的好结果，但其他公司几乎没有竞争空间。据估计，OpenAI每年在模型开发上花费100亿美元，相比之下，竞争对手Anthropic预测这类开支将在几年内发生。此外，这些公司提供高可扩展性和卓越的服务标准，导致较少客户选择自我部署和运营模型。然而，开源模型具有增长机会。为了解决高昂的成本和改进定制，低成本的较小模型可以彻底改变领域。相反，主要模型提供商通常由于较低的运营成本主导较大项目。例如，使用OpenAI的资源大约比为同一目的精调单个模型便宜8-20倍。总的来说，在成本效用方面难以对抗OpenAI，因为其巨大的资源和规模经济。这使得其他实体在寻求类似结果时面临有限的选择。然而，开源倡议仍然是AI生态系统的核心组成部分，为创新提供了特定领域的潜力。目前，OpenAI在成本效用和服务交付方面占据主导地位，为企业提供了一个订阅模式的机会，节省显著的运营成本。然而，针对小型AI应用的专用设计解决方案等创新解决方案为进一步增长提供了丰富的前景，尽管初始投资相对较少，这对初创企业和寻求更快回报的新进入者是有益的。最终，主要挑战仍然是平衡操作成本与优化模型性能，确保每个交互的持续质量控制。人工智能的竞赛不断加剧，不断演变成为能够处理越来越复杂的认知过程，涉及人类语音模式、语调和上下文细微差别的高级技术。

在蒸馏专业知识方面，MoE（专家混合）框架可能在替代当前用于辅助驾驶或生成对话等任务的现有大型语言模型方面提供机会。这种方法允许更高效的微调，减少特定功能所需的标注数据量，同时保持准确性。通过使用软选择机制组合具有独特技能的多个较小专家，可以使用较少的数据和较少的标签创建任务特定的功能。此外，从GPT之前版本（如DistilledGTP-3）训练的模型是实现这一目标的另一个潜在途径。尽管MoE仍在积极研究和开发中，但其压缩复杂模型和高效传递知识的能力为未来的研究和开发提供了有希望的潜力。

原文

Source: Ideogram. “OpenAI on a throne surrounded by piles of money.”

Since the beginning of the internet, data flywheels have created giant companies — first Google, quickly followed by social media companies, and now, OpenAI and other LLM providers.

OpenAI alone likely has more usage than all the other model providers combined, with Google and Anthropic making up most of the rest. These companies are collecting enormous amounts of data — not only can they see user prompts, they also get explicit feedback (thumbs up/thumbs down) as well as implicit feedback (e.g., asking more questions if you didn’t get the answer you wanted). Better yet, they are also at the forefront of customer conversations, understanding where LLM users are pushing the boundaries and where the models fail.

All of this is grist for the mill of future model training, and investment is only accelerating: Anthropic CEO Dario Amodei recently predicted that we’ll have models that cost $10B in 2 years.

Model quality is definitely a big advantage, but it’s only a part of the story. The more impressive moat these companies have is the scalability of their infrastructure and the quality of their service. Let’s look at fine-tuning APIs as an illustrative example.

Our team at RunLLM has been running experiments recently with the GPT fine-tuning API. A single fine-tuning run on GPT-3.5 costs us anywhere from $4-12 dollars and take about 1-1.5 hours to fine-tune over about 1 million tokens.

Meanwhile, a single p4d.24xlarge on AWS costs $32.77 per-hour on-demand or $19.22 per-hour if you reserve for 1 year. Each machine comes with 8 Nvidia A100 GPUs. Assuming that OpenAI only uses 8 GPUs to fine-tune GPT-3.5, it’s 3-8x cheaper to use OpenAI than it is to rent a p4d.24xlarge from Amazon — without even accounting for the technical expertise required to deploy and run the jobs.

AWS is obviously charging a markup on its EC2 instances, but OpenAI’s costs include training and storing the model weights (likely reasonably cheap with LoRA), building & maintaining the fine-tuning infrastructure, and the expertise needed to rack & stack thousands of GPUs internally

If we had a dense enough workload, perhaps we could justify renting a p4d.24xlarge at the yearly reserved cost. At $19.22 per-hour, we’ll be paying about $166K per-year.

Let’s assume again we’re using LoRA to fine-tune a model on 8 A100s, perhaps at 2 hours per run. We can do 12 fine-tuning runs per-day. on these GPUs, or 4,380 fine-tuning runs per year. We’ll allocate one engineer to deploy, check, and validate fine-tuning runs full-time (we don’t envy them!), which will cost us perhaps $200K per-year. (Let’s also assume that we have plenty of data readily available to us to keep fine-tuning jobs going constantly.)

At $366K ($166K AWS + $200K talent), we’re paying around $80 per-fine-tuning run, about 8-20x higher than what we’re paying OpenAI!

This just to fine-tune a model. While per-token inference costs for fine-tuned GPT-3.5 is 10x more expensive than GPT-3.5 it is still 10x cheaper than GPT-4! Serving a model on your own hardware is significantly more expensive unless you can reach a large enough scale to fully utilize serving hardware or elastically scale (impossible when GPU availability is limited).

We’ll give the back of the envelope math a rest, but it proves a critical point: The major LLM providers’ advantage doesn’t just lie in the quality of the model but in their ability to serve models at extreme economies of scale. It simply doesn’t make sense for most organizations to run after their own open-source LLM deployments. They’ll be sinking needless time, talent, and money into an unsolvable optimization problem, while competitors will move faster and likely achieve better quality by layering on top of OpenAI.

Of course, that doesn’t mean that open-source models have no future. We touched on this last week, and our friend Nathan Lambert at Interconnects recently wrote about the future of open-source models as well. Open-source models must get smaller over time to reduce the cost, complexity, and time required to customize and run them.

For everything else, the major LLM providers will dominate.

OpenAI太贵了，无法击败。 OpenAI is too cheap to beat

OpenAI太贵了，无法击败。
OpenAI is too cheap to beat