谷歌是如何构建其双子座机器人模型的
How Google built its Gemini robotics models

原始链接: https://blog.google/products/gemini/how-we-built-gemini-robotics/

谷歌发布了Gemini Robotics,一个全新的多模态AI模型系列,使机器人能够在无需事先特定训练的情况下执行复杂任务。这代表了机器人技术的重大进步,例如一个机器人首次尝试就能成功执行“灌篮”指令,即使它从未接触过篮球或所使用的特定玩具。 基于Gemini 2.0并使用机器人专用数据进行微调,这些模型将物理动作与Gemini现有的文本、视频和音频理解能力相结合。其结果是高度灵巧、互动且通用的机器人,能够自主适应新的物体、环境和指令。 据谷歌首席执行官Sundar Pichai称,这一发展为下一代机器人技术奠定了基础,并具有广泛的应用前景。最终目标是创造具身AI,为能够协助日常任务的机器人提供动力,成为类似于手机或电脑的无处不在的“物理世界中的代理”。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Google 如何构建 Gemini 机器人模型 (blog.google) 14 分 simonpure 3 小时前 | 隐藏 | 过去 | 收藏 | 讨论 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创企业学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:
相关文章

原文

“We’d trained models to help robots with specific tasks and to understand natural language before, but this was a step change,” Carolina says. “The robot had never seen anything related to basketball, or this specific toy. Yet it understood something complex — ‘slam dunk the ball’ — and performed the action smoothly. On its first try.

This all-rounder robot was powered by a Gemini Robotics model that is part of a new family of multimodal models for robotics. The models build upon Gemini 2.0 through fine-tuning with robot-specific data, adding physical action to Gemini’s multimodal outputs like text, video and audio. "This milestone lays the foundation for the next generation of robotics that can be helpful across a range of applications," said Google CEO Sundar Pichai when announcing the new models on X.

The Gemini Robotics models are highly dextrous, interactive and general, meaning they can drive robots to react to new objects, environments and instructions without further training. Helpful, given the team’s ambitions.

“Our mission is to build embodied AI to power robots that help you with everyday tasks in the real world,” says Carolina, whose fascination with robotics began with childhood sci-fi cartoons, fueled by dreams of automated chores. “Eventually, robots will be just another surface on which we interact with AI, like our phones or computers — agents in the physical world.”

联系我们 contact @ memedata.com