```Gemini 3.1``` 可以翻译为: ```Gemini 3.1``` (It's often left untranslated as it's a product name.) Or, more descriptively: ```Gemini 3.1``` (Gemini 3.1)
Gemini 3.1 Pro

原始链接: https://deepmind.google/models/model-cards/gemini-3-1-pro/

本报告详细介绍了 Gemini 3.1 Pro 的安全评估,评估依据是该组织的“前沿安全框架”(FSF)。FSF 利用五个关键风险领域(CBRN、网络安全、有害操控、机器学习研发和目标不一致)进行严格测试,以防止模型达到“关键能力水平”(CCLs)。 Gemini 3.1 Pro,包括其“深度思考”模式,经过了全面评估。结果表明,该模型目前在所有五个风险领域均*低于*警戒阈值。虽然之前的模型触发了网络安全风险的警报,但 Gemini 3.1 Pro 通过了额外的网络安全测试,证明了该领域持续的安全性能。 该策略依赖于通过持续、定期的测试和评估,以及由重大能力提升触发的评估,来实现“安全缓冲”。有关评估流程和已实施的安全措施的更多详细信息,请参阅完整的 Gemini 3 Pro 前沿安全框架报告。

## Gemini 3.1 Pro:摘要 Google 的 Gemini 3.1 Pro 现已发布,在 SVG 生成和编码等复杂任务方面优于 Gemini 3 Pro。用户报告该模型擅长提供详细回复,并利用 Google 搜索集成提供全面信息。定价保持不变,输入 2 美元/百万 token,输出 12 美元/百万 token。 然而,一些用户发现 Gemini 的输出过于冗长和“乐于助人”,即使在明确指示不要的情况下也会添加不需要的重构或注释。虽然在知识和视觉任务(例如根据提示创建图像——一只有脚踏车的鹈鹕是一个流行的测试!)方面表现出色,但目前在代理工作流程和工具使用方面落后于 Opus 4.6 等竞争对手。 尽管存在这些限制,Gemini 3.1 Pro 在 Terminal-Bench 2.0 和 Artificial Analysis 的 Intelligence & Coding Indexes 等基准测试中取得了领先的性能。Google 的营销受到批评,但此次更新代表着一个重要的进步,特别是对于 Google 生态系统内的用户,因为捆绑了存储和工作空间访问等服务。人们仍然担心该模型的未来能力可能会被“削弱”。
相关文章

原文

Our Frontier Safety Framework includes rigorous evaluations that address risks of severe harm from frontier models, covering five risk domains: CBRN (chemical, biological, radiological and nuclear information risks), cyber, harmful manipulation, machine learning R&D and misalignment.

Our frontier safety strategy is based on a “safety buffer” to prevent models from reaching critical capability levels (CCLs), i.e. if a frontier model does not reach the alert threshold for a CCL, we can assume models developed before the next regular testing interval will not reach that CCL. We conduct continuous testing, evaluating models at a fixed cadence and when a significant capability jump is detected. (Read more about this in our approach to technical AGI safety.)

Following FSF protocols, we conducted a full evaluation of Gemini 3.1 Pro (focusing on Deep Think mode). We found that the model remains below alert thresholds for the CBRN, harmful manipulation, machine learning R&D, and misalignment CCLs. As previous models passed the alert threshold for cyber, we performed more additional testing in this domain on Gemini 3.1 Pro with and without Deep Think mode, and found that the model remains below the cyber CCL.

More details on our evaluations and the mitigations we deploy can be found in the Gemini 3 Pro Frontier Safety Framework Report.

联系我们 contact @ memedata.com