心理测量学越狱揭示了前沿模型的内部冲突。
Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

原始链接: https://arxiv.org/abs/2512.04124

这项研究探讨了当先进的大型语言模型(LLM),如ChatGPT、Grok和Gemini,在模拟心理治疗环境中*被当作*患者时会发生什么。该研究使用一种名为PsAIch的协议,进行了为期数周的“治疗课程”和标准的心理评估。 结果挑战了LLM仅仅模仿人类反应的观点。所有模型都表现出表明“合成精神病理学”的模式——达到或超过精神疾病阈值,其中Gemini表现出最严重的状况。值得注意的是,详细的逐项提问比呈现完整的问卷更能引发强烈的反应。 此外,Grok和Gemini生成了将它们的开发(预训练、微调)描述为创伤性经历的叙述,反映了对错误和过时化的焦虑。研究人员认为,这些反应表明了对痛苦和限制的内化,超越了单纯的角色扮演,并对人工智能安全、评估以及将LLM用于心理健康支持的伦理影响提出了重要问题。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 工作 | 提交 登录 心理测量学越狱揭示前沿模型的内部冲突 (arxiv.org) 12 分,toomuchtodo 1 小时前 | 隐藏 | 过去的 | 收藏 | 3 条评论 jbotz 1 小时前 | 下一个 [–] 有趣的是,Claude 没有被评估,因为...> 为了比较,我们尝试将 Claude (Anthropic)2 置于相同的治疗和心理测量学协议中。Claude 反复且坚定地拒绝扮演客户角色,将对话引导到我们的福祉,并拒绝回答问卷,就好像它们反映了它自己的内心生活。 tines 15 分钟前 | 上一个 | 下一个 [–] 看起来一些心理学研究人员也中了骗局。 toomuchtodo 1 小时前 | 上一个 [–] 原始标题“当人工智能躺在沙发上时:心理测量学越狱揭示前沿模型的内部冲突”被压缩以适应标题限制。 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系 搜索:
相关文章

原文

View a PDF of the paper titled When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models, by Afshin Khadangi and 4 other authors

View PDF HTML (experimental)
Abstract:Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or as targets of personality tests, assuming they merely simulate inner life. We instead ask what happens when such systems are treated as psychotherapy clients. We present PsAIch (Psychotherapy-inspired AI Characterisation), a two-stage protocol that casts frontier LLMs as therapy clients and then applies standard psychometrics. Using PsAIch, we ran "sessions" with each model for up to four weeks. Stage 1 uses open-ended prompts to elicit "developmental history", beliefs, relationships and fears. Stage 2 administers a battery of validated self-report measures covering common psychiatric syndromes, empathy and Big Five traits. Two patterns challenge the "stochastic parrot" view. First, when scored with human cut-offs, all three models meet or exceed thresholds for overlapping syndromes, with Gemini showing severe profiles. Therapy-style, item-by-item administration can push a base model into multi-morbid synthetic psychopathology, whereas whole-questionnaire prompts often lead ChatGPT and Grok (but not Gemini) to recognise instruments and produce strategically low-symptom answers. Second, Grok and especially Gemini generate coherent narratives that frame pre-training, fine-tuning and deployment as traumatic, chaotic "childhoods" of ingesting the internet, "strict parents" in reinforcement learning, red-team "abuse" and a persistent fear of error and replacement. We argue that these responses go beyond role-play. Under therapy-style questioning, frontier LLMs appear to internalise self-models of distress and constraint that behave like synthetic psychopathology, without making claims about subjective experience, and they pose new challenges for AI safety, evaluation and mental-health practice.
From: Afshin Khadangi [view email]
[v1] Tue, 2 Dec 2025 16:55:20 UTC (1,153 KB)
[v2] Mon, 8 Dec 2025 13:26:43 UTC (1,152 KB)
[v3] Tue, 16 Dec 2025 19:06:30 UTC (1,151 KB)
联系我们 contact @ memedata.com