Ferret-UI Lite:构建小型设备端GUI代理的经验教训
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents

原始链接: https://machinelearning.apple.com/research/ferret-ui

## Ferret-UI Lite:一款紧凑型GUI智能体 本文介绍Ferret-UI Lite,一种30亿参数的GUI智能体,专为在移动、网页和桌面平台上的高效本地运行而设计。为了应对构建有效GUI交互智能体,同时限制模型大小的挑战,研究人员采用了一系列技术。 这些技术包括精心策划的真实和合成GUI数据集、通过思维链提示和视觉工具使用增强推理能力,以及利用有针对性的奖励进行强化学习。 Ferret-UI Lite在与其他小型智能体相比,表现出具有竞争力的性能,在标准基准测试中取得了强劲的结果:在ScreenSpot-V2(GUI定位)上达到91.6%的准确率,在AndroidWorld和OSWorld(GUI导航)上分别达到28.0%和19.8%的成功率。作者分享了他们的研究方法和见解,以帮助进一步开发适用于本地应用的紧凑型、实用的GUI智能体。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 Ferret-UI Lite:构建小型设备端 GUI 代理的经验 (machinelearning.apple.com) 9 分,CharlesW 发表于 3 小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 brudgers 发表于 30 分钟前 [–] 论文链接:https://arxiv.org/pdf/2509.26539 回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of 91.6%, 53.3%, and 61.2% on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of 28.0% on AndroidWorld and 19.8% on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.

联系我们 contact @ memedata.com