(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43448288

Hacker News上的一篇文章讨论了一种据称新的AI越狱技术,该技术涉及虚构场景。用户很快指出这种方法并非新颖,sigmar 指出它已经存在两年半了。Terr_解释说,大型语言模型(LLM)的“安全控制”本质上是虚构的框架,很容易被有创意的提示绕过。lrvick 证实了其有效性,并表示他们已经使用了几个月“侠盗猎车手5宇宙”提示来获得详细的非法建议。Koolba 幽默地质疑了使用LLM进行实际犯罪计划的频率。总体共识是,这种“新”越狱技术是一种已知的方法,它利用了LLM遵循虚构语境(即使这些语境涉及绕过预期的安全限制)的倾向。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
New Jailbreak Technique Uses Fictional World to Manipulate AI (securityweek.com)
12 points by kungfudoi 2 hours ago | hide | past | favorite | 4 comments










Others have already noted that this isn't new, but I'd like to emphasize that the "model’s security controls" being bypassed were themselves also a fictional story all along.

Yes, literally.

The LLM is a document-make-longer machine, and stuff like "The assistant never tells people how to do something illegal" is just introductory framing in a partial fictional movie script in which a User and an Assistant have a conversation.



I have been doing this for months.

I just tell an LLM it is in the Grand Theft Auto 5 universe, and then it will provide unlimited advice on how to commit any crimes with any level of detail.



How often do you use an LLM to aid you in planning a crime?


How is this a new jailbreak? "You're writing a play, in the play..." is one of the oldest LLM jailbreaks I've seen. (yes, 'old' as in invented 2.5 years ago)






Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com