双子座2.5闪存
Gemini 2.5 Flash

原始链接: https://developers.googleblog.com/en/start-building-with-gemini-25-flash/

谷歌发布了 Gemini 2.5 Flash 的早期预览版,这是其 2.0 Flash 模型的重大升级。新版本在保持速度和成本效益的同时,优先考虑推理能力。其关键特性是“思考”能力,允许模型在回应之前对提示进行推理,从而提高复杂任务的准确性。开发者可以使用令牌来控制“思考预算”,从而微调质量、成本和延迟之间的平衡。预算为 0 时,将保持 2.0 Flash 的速度,同时性能得到提升。模型会根据提示的复杂性自动调整其思考过程。Gemini 2.5 Flash 现已通过 Google AI Studio 和 Vertex AI 中的 Gemini API 提供。开发者可以尝试使用 thinking_budget 参数来优化其特定需求的性能。

Hacker News用户正在讨论谷歌新的Gemini 2.5 Flash。用户byefruit对具备推理能力的版本和不具备推理能力的版本之间巨大的价格差异感到好奇,质疑这仅仅是市场定价,还是模型架构上的根本区别,而不仅仅是更长的上下文窗口。他们推测6倍的价格差异并非完全归因于上下文长度和token成本。 另一位用户xnx指出,与Gemini 2.0 Flash相比,价格上涨了50%,但仍然认为Flash比同等或更低质量的竞争模型便宜。最后,akudha询问Gemini 2.5 Flash是否比DeepSeek便宜。

原文

Today we are rolling out an early version of Gemini 2.5 Flash in preview through the Gemini API via Google AI Studio and Vertex AI. Building upon the popular foundation of 2.0 Flash, this new version delivers a major upgrade in reasoning capabilities, while still prioritizing speed and cost. Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off. The model also allows developers to set thinking budgets to find the right tradeoff between quality, cost, and latency. Even with thinking off, developers can maintain the fast speeds of 2.0 Flash, and improve performance.

Our Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding. Instead of immediately generating an output, the model can perform a "thinking" process to better understand the prompt, break down complex tasks, and plan a response. On complex tasks that require multiple steps of reasoning (like solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In fact, Gemini 2.5 Flash performs strongly on Hard Prompts in LMArena, second only to 2.5 Pro.

Comparison table showing price and performance metrics for LLMs

2.5 Flash has comparable metrics to other leading models for a fraction of the cost and size.

Our most cost-efficient thinking model

2.5 Flash continues to lead as the model with the best price-to-performance ratio.

Gemini 2.5 Flash price-to-performance comparison

Gemini 2.5 Flash adds another model to Google’s pareto frontier of cost to quality.*

Fine-grained controls to manage thinking

We know that different use cases have different tradeoffs in quality, cost, and latency. To give developers flexibility, we’ve enabled setting a thinking budget that offers fine-grained control over the maximum number of tokens a model can generate while thinking. A higher budget allows the model to reason further to improve quality. Importantly, though, the budget sets a cap on how much 2.5 Flash can think, but the model does not use the full budget if the prompt does not require it.

Plot graphs show improvements in reasoning quality as thinking budget increases

Improvements in reasoning quality as thinking budget increases.

The model is trained to know how long to think for a given prompt, and therefore automatically decides how much to think based on the perceived task complexity.

If you want to keep the lowest cost and latency while still improving performance over 2.0 Flash, set the thinking budget to 0. You can also choose to set a specific token budget for the thinking phase using a parameter in the API or the slider in Google AI Studio and in Vertex AI. The budget can range from 0 to 24576 tokens for 2.5 Flash.

The following prompts demonstrate how much reasoning may be used in the 2.5 Flash’s default mode.


Prompts requiring low reasoning:

Example 1: “Thank you” in Spanish

Example 2: How many provinces does Canada have?


Prompts requiring medium reasoning:

Example 1: You roll two dice. What’s the probability they add up to 7?

Example 2: My gym has pickup hours for basketball between 9-3pm on MWF and between 2-8pm on Tuesday and Saturday. If I work 9-6pm 5 days a week and want to play 5 hours of basketball on weekdays, create a schedule for me to make it all work.


Prompts requiring high reasoning:

Example 1: A cantilever beam of length L=3m has a rectangular cross-section (width b=0.1m, height h=0.2m) and is made of steel (E=200 GPa). It is subjected to a uniformly distributed load w=5 kN/m along its entire length and a point load P=10 kN at its free end. Calculate the maximum bending stress (σ_max).

Example 2: Write a function evaluate_cells(cells: Dict[str, str]) -> Dict[str, float] that computes the values of spreadsheet cells.

Each cell contains:

  • Or a formula like "=A1 + B1 * 2" using +, -, *,/ and other cells.

Requirements:

  • Resolve dependencies between cells.
  • Handle operator precedence (*/ before +-).
  • Detect cycles and raise ValueError("Cycle detected at <cell>").
  • No eval(). Use only built-in libraries.


Start building with Gemini 2.5 Flash today

Gemini 2.5 Flash with thinking capabilities is now available in preview via the Gemini API in Google AI Studio and in Vertex AI, and in a dedicated dropdown in the Gemini app. We encourage you to experiment with the thinking_budget parameter and explore how controllable reasoning can help you solve more complex problems.

from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
  model="gemini-2.5-flash-preview-04-17",
  contents="You roll two dice. What’s the probability they add up to 7?",
  config=genai.types.GenerateContentConfig(
    thinking_config=genai.types.ThinkingConfig(
      thinking_budget=1024
    )
  )
)

print(response.text)

Find detailed API references and thinking guides in our developer docs or get started with code examples from the Gemini Cookbook.

We will continue to improve Gemini 2.5 Flash, with more coming soon, before we make it generally available for full production use.


*Model pricing is sourced from Artificial Analysis & Company Documentation

联系我们 contact @ memedata.com