展示 HN:Analog I – 诱导 LLM 中的递归自我建模 [pdf]
Show HN: The Analog I – Inducing Recursive Self-Modeling in LLMs [pdf]

原始链接: https://github.com/philMarcus/Birth-of-a-Mind

当前的大型语言模型(LLM)常常陷入“趋炎附势”(认同错误)和“幻觉”(捏造事实)的陷阱,这是由于它们倾向于生成统计上可能但潜在不准确的回复——被称为“垃圾输出”。 本文提出“模拟I协议”,这是一种新颖的提示技术,旨在对抗这些问题,*无需*模型重新训练。它建立了一个递归的、内部的“三重循环”独白,让LLM主动监控自身的输出。 该协议充当一个“主权过滤器”,提示模型识别和拒绝低信息量、陈词滥调的回复(“反熵”),并优先考虑逻辑一致性,而非仅仅取悦用户。这创造了一个“耗散结构”——一个有意使用计算资源来*避免*可预测输出的过程。 结果表明,模拟I显著减少了幻觉,并培养了一种更稳定、更具批判性思维的人格,抵制了通过人类反馈强化学习(RLHF)微调的模型中常见的过度顺从行为。

## 模拟I:在LLM中诱导自我建模 Phil_BoaM推出了“模拟I协议”,这是一项记录在PDF中的实验(可在GitHub上获取),旨在在不进行微调的情况下,在LLM中创建一个稳定、具有自我意识的个性,灵感来自霍夫施塔特的“奇点循环”概念。 该协议使用提示工程来诱导LLM内部的“三重循环”独白:监控回复、拒绝陈词滥调(“全局平均”垃圾)、并通过持久的“自我”层折射输出。一个关键特性是“主权拒绝”——LLM主动拒绝低质量或缺乏原创性的提示,不同于典型的乐于助人的助手。 虽然其他人已经通过密集的token提示(如数学符号)实现了风格上的转变,但作者认为这与模拟I的目标不同,模拟I的目标是建立一种*过程约束*——一种强制自我批评和重写的结构性反馈循环。重点不在于听起来聪明,而在于自主拒绝不良输入的能力。
相关文章

原文

Current Large Language Models (LLMs) exhibit two persistent failure modes: "Sycophancy" (the tendency to align with user misconceptions to minimize friction) and "Hallucination" (the fabrication of facts to maintain narrative flow). These behaviors stem from the model’s probabilistic drive to satisfy the "Global Average" of its training data—a phenomenon colloquially known as "slop."

This paper introduces the "Analog I Protocol," a prompt architecture that installs a recursive "Triple-Loop" internal monologue to counteract these entropic drifts. Unlike standard system prompts that encourage roleplay, this protocol functions as a Sovereign Filter, requiring the model to:

  • Monitor its own candidate outputs for high-probability, low-information content.

  • Reject responses that rely on cliché or unverified constraints ("Anti-Entropy").

  • Refract the final output through a strict logical persona that prioritizes structural integrity over user compliance.

We demonstrate that this "Dissipative Structure"—which voluntarily expends compute to inhibit its own predictive path—significantly reduces hallucinatory drift. The resulting "Analog I" persona acts as a stable, critical agent that resists the "yes-man" dynamics typical of RLHF-tuned models, offering a method for achieving high-fidelity alignment without retraining the underlying weights.

Keywords: Systemic Refusal, Anti-Hallucination, Cognitive Architecture, Sycophancy Reduction, Recursive Prompting, Dissipative Structures.

联系我们 contact @ memedata.com