谷歌AI Studio的升级开发体验

谷歌AI Studio的升级开发体验
An upgraded dev experience in Google AI Studio

原始链接: https://developers.googleblog.com/en/google-ai-studio-native-code-generation-agentic-tools-upgrade/

Google AI Studio 现在是构建 Gemini API 的最快方式，它拥有先进的模型，例如 Gemini 2.5 和包括 Imagen、Lyria RealTime 和 Veo 在内的生成式媒体工具。该平台引入了新的“构建”选项卡，该选项卡针对 Gemini 2.5 Pro 的编码能力进行了优化。用户可以直接从文本、图像或视频提示生成和迭代 Web 应用程序，查看代码差异，并一键部署到 Cloud Run。Google AI Studio 处理 API 调用，并为共享应用程序使用其自身的免费配额。新的“生成媒体”页面简化了多模态生成，可访问 DeepMind 的模型。Lyria RealTime 支持交互式音乐生成，而 Gemini 2.5 Flash 通过自然的音频对话和主动的背景噪音过滤增强了实时 API。Gemini 2.5 Pro 和 Flash 现在还提供文本转语音 (TTS)，具有可自定义的声音和表达风格。此外，现在还集成了模型上下文协议 (MCP) 支持和用于内容检索、摘要和事实检查的新型 URL 上下文工具。

这个Hacker News帖子讨论了谷歌升级后的AI Studio，其中包含针对编码和应用生成的Gemini 2.5 Pro，它可以使用文本、图像或视频提示进行优化。评论者认为这是AI直接创建应用程序迈出的一步，类似于从汇编语言到FORTRAN的转变。一些人认为这将使领域专家能够在无需专门编码知识的情况下构建工具，而另一些人则担心在云端代码环境下的自由和自主性。 Rabbit，一家正在开发能够自我构建操作系统的初创公司，受到了赞扬和质疑。人们对数据隐私、模型训练实践以及AI辅助作弊的可能性表示担忧。该帖子还强调了在不同的AI平台和模型之间进行选择以及对更好的文件管理和传输方式的需求所带来的挑战。帖子里还就Rabbit R1是否是骗局展开了辩论。此外，还讨论了谷歌的AI产品缺乏连贯性。

Gemini 2.5 Pro 预览：更强大的编码性能 2025-05-06

在预览版中使用 Gemini 2.0 创建和编辑图像 2025-05-07

双子座2.5闪存 2025-04-17

我们的下一代型号：Gemini 1.5 2024-02-16

在 Gemini 和 Whisk 中使用 Veo 2 生成视频。 2025-04-15

原文

Google AI Studio is the fastest place to start building with the Gemini API, with access to our most capable models, including Gemini 2.5 preview models, and generative media models like Imagen, Lyria RealTime, and Veo. At Google I/O, we announced new features to help you build and deploy complete applications, new model capabilities, and new features in the Google Gen AI SDK.

Build apps with Gemini 2.5 Pro code generation

Gemini 2.5 Pro is incredible at coding, so we’re excited to bring it to Google AI Studio’s native code editor. It’s tightly optimized with our Gen AI SDK so it’s easier to generate apps with a simple text, image, or video prompt. The new Build tab is now your gateway to quickly build and deploy AI-powered web apps. We’ve also launched new showcase examples to experiment with new models and more.

Video of text adventure game being generated with Imagen + Gemini (video sped up for illustrative purposes)

In addition to app generation from a single prompt, you can continue to iterate your web app over chat. This allows you to make changes, view diffs, and even jump back to previous checkpoints to revert edits.

Compare code versions and use expanded file structure to help manage project development (video sped up for illustrative purposes)

You can also deploy those newly created apps in a single click to Cloud Run.

Quickly deploy your created apps on Cloud Run (video sped up for illustrative purposes)

Google AI Studio apps and generated code leverage a unique placeholder API key, allowing Google AI Studio to proxy all Gemini API calls. Consequently, when you share your app with Google AI Studio, all API usage by its users is attributed to their Google AI Studio free of charge usage, completely bypassing your own API key and quota. You can read more in our FAQ.

This feature is experimental, so you should always check code before sharing your project externally. Our one-shot generation has been primarily optimized to work with Gemini and Imagen models, with support for more models and tool calls coming soon.

Multimodal generation made easy in Google AI Studio

We've been working hard to get Google DeepMind’s advanced multimodal models into developers’ toolboxes, faster. The new Generate Media page centralizes the discovery of Imagen, Veo, Gemini with native image generation, and new native speech generation models. Plus, experience interactive music generation with Lyria RealTime with the PromptDJ apps built in Google AI Studio.

Explore our generative media models from the Google AI Studio Generate Media tab (video sped up for illustrative purposes)

New native audio for the Live API and text-to-speech (TTS)

With Gemini 2.5 Flash native audio dialog in preview in the Live API, the model now generates even more natural responses with support for over 30 voices. We’ve also added proactive audio so the model can distinguish between the speaker and background conversations, so it knows when to respond. This makes it possible for you to build conversational AI agents and experiences that feel more intuitive and natural.

Try native audio dialog in the Google AI Studio Stream tab (video sped up for illustrative purposes)

In addition to the Live API, we’ve announced Gemini 2.5 Pro and Flash previews for text-to-speech (TTS) that support native audio output. Now you can craft single and multi-speaker output with flexible control over delivery style.

Generate speech using the new TTS capabilities (video sped up for illustrative purposes)

Try native audio in the Live API from the Stream tab and experience new TTS capabilities via Generate Speech.

Model Context Protocol (MCP) support

Model Context Protocol (MCP) definitions are also now natively supported in the Google Gen AI SDK for easier integration with a growing number of open-source tools. We’ve included a demo app that shows how you can use an MCP server within Google AI Studio that combines Google Maps and the Gemini API

New URL Context tool

URL Context is a new experimental tool that gives the model the ability to retrieve and reference content from links you provide. This is helpful for fact-checking, comparison, summarization, and deeper research.

Start building in Google AI Studio

We’re thrilled to bring all of these updates to Google AI Studio, making it the place for developers to explore and build with the latest models Google has to offer.

Explore this announcement and all Google I/O 2025 updates on io.google starting May 22.