``` Gemini 机器人-ER 1.6 ```
Gemini Robotics-ER 1.6

原始链接: https://deepmind.google/blog/gemini-robotics-er-1-6/

## Gemini Robotics-ER 1.6:增强机器人推理能力 谷歌发布了Gemini Robotics-ER 1.6,这是其人工智能模型的重大升级,旨在让机器人更深入地理解物理世界。这种“具身推理”使机器人能够超越仅仅执行命令,而是*推理*其周围环境——这对实际应用至关重要。 新模型在空间理解、任务规划和识别任务完成方面表现出色。它能够利用Google搜索和视觉-语言-行动模型自主执行任务。 Gemini Robotics-ER 1.6 在指向、计数以及尤其是在**仪表读取**(与波士顿动力合作开发,用于解读复杂仪表)等能力方面超越了之前的版本(1.5 和 Gemini 3.0 Flash)。 该模型现在通过Gemini API和Google AI Studio向开发者开放,并提供了示例代码以方便实施具身推理任务。这项进展旨在解锁机器人跨各个行业的全新自主水平。

DeepMind 的 Gemini Robotics-ER 1.6 正在引发关于利用当前生成式 AI “堆栈”令人信服地模拟人类/动物行为的潜力讨论,但目前受到缓慢的推理速度限制。 Hacker News 上的一位评论员强调,该系统能够例如通过图像分析和 Python 脚本解释仪表读数,这是一个令人鼓舞的步骤。然而,整个过程仍然太慢,无法进行实时、复杂的决策。 核心思想是,更快的推理速度将释放诸如模型从图像生成可能的未来场景、自我描述这些场景以及做出决策等能力—— 模仿无意识的大脑过程。该评论员认为,推理吞吐量提高 100 倍到 1000 倍将彻底改变 AI 开发。
相关文章

原文

For robots to be truly helpful in our daily lives and industries, they must do more than follow instructions, they must reason about the physical world. From navigating a complex facility to interpreting the needle on a pressure gauge, a robot’s “embodied reasoning” is what allows it to bridge the gap between digital intelligence and physical action.

Today, we’re introducing Gemini Robotics-ER 1.6, a significant upgrade to our reasoning-first model that enables robots to understand their environments with unprecedented precision. By enhancing spatial reasoning and multi-view understanding, we are bringing a new level of autonomy to the next generation of physical agents.

This model specializes in reasoning capabilities critical for robotics, including visual and spatial understanding, task planning and success detection. It acts as the high-level reasoning model for a robot, capable of executing tasks by natively calling tools like Google Search to find information, vision-language-action models (VLAs) or any other third-party user-defined functions.

Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash, specifically enhancing spatial and physical reasoning capabilities such as pointing, counting, and success detection. We are also unlocking a new capability: instrument reading, enabling robots to read complex gauges and sight glasses — a use case we discovered through close collaboration with our partner, Boston Dynamics.

Starting today, Gemini Robotics-ER 1.6 is available to developers via the Gemini API and Google AI Studio. To help you get started, we are sharing a developer Colab containing examples of how to configure the model and prompt it for embodied reasoning tasks.

联系我们 contact @ memedata.com