Fly.io 现在有 GPU

Fly.io 现在有 GPU
Fly.io has GPUs now

原始链接: https://fly.io/blog/fly-io-has-gpus-now/

简介：Fly.io 宣布了其最新功能 – GPU。 Fly.io 允许开发人员在靠近用户位置的地方执行人工智能、机器学习、计算机视觉和自然语言处理等密集型任务。 Fly.io 提供 Nvidia A100 卡，提供 40GB 或 80GB 内存，带有 8 个 AMD Epyc 核心（默认）或专用主机选项，适合需要无与伦比性能的用户。用户可以在全球多个地区（包括芝加哥和阿姆斯特丹）立即激活 Fly Apps，支付每小时 2.50 美元起的小时费率，或预订实例以节省成本。用户受益于实时响应功能，特别是对于那些希望在几秒钟内获得紧急三明治食谱的人来说，为每个人节省时间。此外，Fly.io 支持按需使用自动扩展，无需为空闲时间付费。要了解有关 Fly.io 的更多信息并发现其功能，请访问 www.fly.io，通过社交媒体平台或电子邮件联系该公司。通过检查上面的“下一篇文章”按钮了解更多信息。结论：Fly.io 提供尖端的计算解决方案，旨在为 AI 和 ML 应用程序提供高性能结果。其新系列 GPU 卡提供了改变游戏规则的功能，包括比以往更快地处理大量数据的能力。通过使用 Fly.io 的云服务，开发人员可以降低延迟、缩短响应时间并为最终用户提供即时体验。无论是开发庞大的食谱数据库，还是创建允许客户将口语翻译成书面文本的直观工具，创新和增长的可能性都是无限的。如需了解更多信息，请通过 Twitter 上的@flydotio 关注 Fly.io，或加入其在线社区平台上的讨论。立即访问 http://www.fly.io 免费注册，并开始在靠近目标受众位置的地方执行复杂的计算。

光速是物理学中的一个基本常数，代表宇宙中可达到的最高速度。然而，当用作章节标题时，它可能表明该章节涉及的技术和进步涉及接近或超过物理领域理论最大速度的速度，暗示该领域内的突破性创新或见解。这是一个引人注目的选择，旨在激发读者的好奇心并传达兴奋或进展，而不是准确反映信息内容或相关性。

原文

A cartoon illustration of a green haired woman with a ponytail looks into a portal in a datacentre to see a graceful llama. — Image by Annie Ruygt

We’re Fly.io, we’re a new public cloud that lets you put your compute where it matters: near your users. Today we’re announcing that you can do this with GPUs too, allowing you to do AI workloads on the edge. Want to find out more? Keep reading.

AI is pretty fly

AI is apparently a bit of a thing (maybe even an thing come to think about it). We’ve seen entire industries get transformed in the wake of ChatGPT existing (somehow it’s only been around for a year, I can’t believe it either). It’s likely to leave a huge impact on society as a whole in the same way that the Internet did once we got search engines. Like any good venture-capital funded infrastructure provider, we want to enable you to do hilarious things with AI using industrial-grade muscle.

Fly.io lets you run a full-stack app - or an entire dev platform based on the Fly Machines API - close to your users. Fly.io GPUs let you attach an Nvidia A100 to whatever you’re building, harnessing the full power of CUDA with more VRAM than your local 4090 can shake a ray-traced stick at. With these cards (or whatever you call a GPU attached to SXM fabric), AI/ML workloads are at your fingertips. You can recognize speech, segment text, summarize articles, synthesize images, and more at speeds that would make your homelab blush. You can even set one up as your programming companion with your model of choice in case you’ve just not been feeling it with the output of other models changing over time.

If you want to find out more about what these cards are and what using them is like, check out What are these “GPUs” really? It covers the history of GPUs and why it’s ironic that the cards we offer are called “Graphics Processing Units” in the first place.

Fly.io GPUs in Action

We want you to deploy your own code with your favorite models on top of Fly.io’s cloud backbone. Fly.io GPUs make this really easy.

You can get a GPU app running Ollama (our friends in text generation) in two steps:

Put this in your fly.toml:

app = "sandwich_ai"
primary_region = "ord"
vm.size = "a100-40gb"

[build]
  image = "ollama/ollama"

[mounts]
  source = "models"
  destination = "/root/.ollama"
  initial_size = "100gb"

Run fly apps create sandwich_ai && fly deploy.

If you want to read more about how to start your new sandwich empire, check out Scaling Large Language Models to zero with Ollama, it explains how to set up Ollama so that it automatically scales itself down when it’s not in use.

The speed of light is only so fast

Being able to spin up GPUs is great, but where Fly.io really shines is inference at the edge.

Let’s say you have an app that lets users enter ingredients they have in their kitchen and receive a sandwich recipe. Your users expect their recipes instantly (or at least as fast as the other leading apps). Seconds count when you need an emergency sandwich.

It’s depressingly customary in the AI industry to cherry-pick outputs. This was not cherry-picked. I used yi:34b to generate this recipe. I’m not sure what a taco salad sandwich is, but I might be willing to try it.

A conversation between a user and an artificial intelligence. The user asks:

In the previous snippet, we deployed our app to ord (primary_region = "ord"). The good news is that our model returns a result really quickly and users in Chicago get instant sandwich recipes. It’s a good experience for users near your datacentre, and you can do this on any half decent cloud provider.

But surely people outside of Chicago need sandwiches too. Amsterdam has sandwich fiends as well. And sometimes it takes too long to have their requests leap across the pond. The speed of light is only so fast after all. Don’t worry, we’ve got your back. Fly.io has GPUs in datacentres all over the world. Even more, we’ll let you run the same program with the same public IP address and the same TLS certificates in any regions with GPU support.

Don’t believe us? See how you can scale your app up in Amsterdam with one command:

fly scale count 2 --region ams

It’s that easy.

Actually On-Demand

GPUs are powerful parallel processing packages, but they’re not cheap! Once we have enough people wanting to turn their fridge contents into tasty sandwiches, keeping a GPU or two running makes sense. But we’re just a small app still growing our user base while also funding the latest large sandwich model research. We want to only pay for GPUs when a user makes a request.

Let’s open up that fly.toml again, and add a section called services, and we’ll include instructions on how we want our app to scale up and down:

[[services]]
  internal_port = 8080
  protocol = "tcp"
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0

Now when no one needs sandwich recipes, you don’t pay for GPU time.

The Deets

We have GPUs ready to use in several US and EU regions and Sydney. You can deploy your sandwich, music generation, or AI illustration apps to:

By default, anything you deploy to GPUs will use eight heckin’ AMD EPYC CPU cores, and you can attach volumes up to 500 gigabytes. We’ll even give you discounts for reserved instances and dedicated hosts if you ask nicely.

We hope you have fun with these new cards and we’d love to see what you can do with them! Reach out to us on X (formerly Twitter) or the community forum and share what you’ve been up to. We’d love to see what we can make easier!