Anthropic 发布了让克劳德打勾的“系统提示”

Anthropic 发布了让克劳德打勾的“系统提示”
Anthropic publishes the 'system prompts' that make Claude tick

原始链接: https://techcrunch.com/2024/08/26/anthropic-publishes-the-system-prompt-that-makes-claude-tick/

人工智能 (AI)，例如生成式预训练 Transformer-4 (GPT-4)，不具备类人智能或个性。相反，它们通过分析数据中的模式来预测可能的后续单词来生成文本。 OpenAI 和 Anthropic 等供应商使用“系统提示”来塑造这些人工智能模型的行为并指导他们的反应。这些提示可能会详细说明特定的特征和界限，例如，指定礼貌但避免道歉，或限制能力，例如不打开 URL 或识别面孔。系统提示通常是保密的，可能是出于竞争问题以及保持对人工智能行为的控制。然而，Anthropic 最近通过 Claude iOS 和 Android 应用程序以及在线透露了其三款最新型号的系统提示 - Claude 3 Opus、Claude 3.5 Sonnet 和 Claude 3 Haiku。随着 Anthropic 完善和调整其系统提示，这种走向透明的举措可能会成为标准做法。 7 月 12 日的系统提示澄清了限制：例如，克劳德无法打开 URL 或执行面部识别。此外，提示定义了某些特征和特征，鼓励模型体现求知欲，参与各种主题的讨论，谨慎处理敏感主题，并避免以“当然”或“绝对”等强烈肯定来开始回答。这种人工编写的角色分析让人联想到人工智能仅仅作为满足用户对话需求的工具，而不是独立的实体。然而，如果没有明确的人类指导，这些模型基本上仍然不成形，而是根据预定义的指令运行。随着 Anthropic 发布详细的系统提示变化（这对于重要的人工智能提供商来说是一个新颖的步骤），它为提高行业透明度树立了先例，并对其他人施加了披露类似信息的竞争压力。这一战略是否会成功还有待观察。

文本摘要： * 由于缺乏承认不确定性或错误的训练，语言模型很难承认错误。 * 在语言模型中添加术语“幻觉”可以让他们更容易地承认和纠正错误。 * 然而，诸如“confabulate”或“lie”之类的词可能会导致语言模型的存在危机。 * 提议的解决方案是创建一个“反向谷歌”，这是一个持续更新的已确认真实信息存储库，语言模型可以利用该存储库来形成准确的答案。 * 反向谷歌的理想场景包括持续集成实时内容（流、播客、社交媒体平台）以实现实时响应。概括：语言模型在处理错误和不确定性方面需要改进。使用“幻觉”这个词可以使模型有效地识别和纠正错误。为了增强语言模型收集可靠信息的能力，提出了一个名为“反向谷歌”的不断更新的知识库的建议。理论上，该资源将集成实时流媒体内容，使模型能够生成及时、准确的响应。

原文

Generative AI models aren’t actually humanlike. They have no intelligence or personality — they’re simply statistical systems predicting the likeliest next words in a sentence. But like interns at a tyrannical workplace, they do follow instructions without complaint — including initial “system prompts” that prime the models with their basic qualities and what they should and shouldn’t do.

Every generative AI vendor, from OpenAI to Anthropic, uses system prompts to prevent (or at least try to prevent) models from behaving badly, and to steer the general tone and sentiment of the models’ replies. For instance, a prompt might tell a model it should be polite but never apologetic, or to be honest about the fact that it can’t know everything.

But vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. The only way to expose GPT-4o‘s system prompt, for example, is through a prompt injection attack. And even then, the system’s output can’t be trusted completely.

However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku) in the Claude iOS and Android apps and on the web.

Alex Albert, head of Anthropic’s developer relations, said in a post on X that Anthropic plans to make this sort of disclosure a regular thing as it updates and fine-tunes its system prompts.

The latest prompts, dated July 12, outline very clearly what the Claude models can’t do — e.g. “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt for Claude Opus tells the model to “always respond as if it is completely face blind” and to “avoid identifying or naming any humans in [images].”

But the prompts also describe certain personality traits and characteristics — traits and characteristics that Anthropic would have the Claude models exemplify.

The prompt for Claude 3 Opus, for instance, says that Claude is to appear as if it “[is] very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”

It’s all a bit strange to this human, these system prompts, which are written like an actor in a stage play might write a character analysis sheet. The prompt for Opus ends with “Claude is now being connected with a human,” which gives the impression that Claude is some sort of consciousness on the other end of the screen whose only purpose is to fulfill the whims of its human conversation partners.

But of course that’s an illusion. If the prompts for Claude tell us anything, it’s that without human guidance and hand-holding, these models are frighteningly blank slates.

With these new system prompt changelogs — the first of their kind from a major AI vendor — Anthropic is exerting pressure on competitors to publish the same. We’ll have to see if the gambit works.

Anthropic 发布了让克劳德打勾的“系统提示” Anthropic publishes the 'system prompts' that make Claude tick

Anthropic 发布了让克劳德打勾的“系统提示”
Anthropic publishes the 'system prompts' that make Claude tick