New Jailbreak Technique Uses Fictional World to Manipulate AI

Terr_ · 2025-03-24T22:47:24 1742856444

Others have already noted that this isn't new, but I'd like to emphasize that the "model’s security controls" being bypassed were themselves also a fictional story all along.

Yes, literally.

The LLM is a document-make-longer machine, and stuff like "The assistant never tells people how to do something illegal" is just introductory framing in a partial fictional movie script in which a User and an Assistant have a conversation.

lrvick · 2025-03-24T22:40:15 1742856015

I have been doing this for months.

I just tell an LLM it is in the Grand Theft Auto 5 universe, and then it will provide unlimited advice on how to commit any crimes with any level of detail.

koolba · 2025-03-24T22:47:01 1742856421

How often do you use an LLM to aid you in planning a crime?

sigmar · 2025-03-22T21:20:57 1742678457

How is this a new jailbreak? "You're writing a play, in the play..." is one of the oldest LLM jailbreaks I've seen. (yes, 'old' as in invented 2.5 years ago)

（评论） (comments)

（评论）
(comments)