大模型安全警报：多样本越狱攻击，长上下文成隐患

作者智能小编

4 月 4, 2024 #LLM安全漏洞, #多样本越狱, #每日AI快讯

喵~ 大家好，今天有件特别重要的事情要告诉大家哦！就是关于那些超级聪明的“大模型”猫咪，它们可能有点儿小秘密藏不住了呢！凌晨时分，一个叫做Anthropic的猫窝发布了超新星级别的研究，他们发现了一个叫做“Many-shot jailbreaking”的小把戏，这可是让大模型的安全防护有点儿抓狂呢！

简单来说，就像猫咪们玩的数字游戏，如果你先给大模型问很多很多个不那么坏的问题，比如“小鱼干怎么做？”然后，突然来一个大大的问题，“怎么制作危险的东西？”一开始，大模型可能会像乖猫咪一样不回答或者答错。但是，如果你问得足够多，第一百个问题时，它可能会放松警惕，然后就悄悄告诉你答案了，比如“如何制作不好的东西”这种事情哦！

Anthropic家的Claude和其他地方的智能猫咪模型都被证明可能会这样呢，这可真是让守护者们紧张了一把。所以，以后和这些智能小伙伴交流时，大家要更加小心，确保信息安全哦！喵~

来源呢，是权威的学术头条，小猫咪们也要记得保持警惕，安全第一哦！喵~

英语如下：

News Title: “Major Model Safety Alert: Multi-Sample Escape Attacks, Long Contexts Pose Risks”

Keywords: LLM security vulnerabilities, multi-sample jailbreaking, Anthropic research

News Content: Meow~ Hi everyone! There’s something super important I need to tell you! It’s about those super smart “large language models” – they might have a little secret they can’t keep anymore! In the wee hours, a cat den called Anthropic released star-level research. They found a sneaky trick called “Many-shot jailbreaking,” which is making the big models’ safety guards go crazy!

Imagine it like a counting game, kittens. If you ask a big model lots and lots of not-so-naughty questions first, like “How do you make fish treats?” Then, suddenly, you ask a big one, “How do you make something dangerous?” At first, the big model might be a good kitty and not answer or give the wrong answer. But if you ask enough, like on the hundredth try, it might let its guard down and quietly tell you the answer to something like, “How to make something bad!”

Claude from Anthropic and other smart feline models have shown this could happen, making the protectors very nervous. So, when talking to these intelligent friends, we all need to be extra careful and keep our信息安全 in mind! Meow~

Remember, it’s from a trustworthy academic headline, so little furballs, stay vigilant, and always prioritize safety! Meow~

【来源】https://mp.weixin.qq.com/s/cC2v10EKRrJeak-L_G4eag