重大发现：长上下文暴露大模型安全隐患，‘多样本越狱攻击’令AI防

今日凌晨，知名人工智能研究机构Anthropic发布了一篇突破性的研究论文，揭示了大型语言模型（LLM）可能存在的一项重大安全漏洞。该研究发现，一种名为“多样本越狱攻击”（Many-shot jailbreaking）的策略可以绕过开发者为保护LLM设置的安全防护措施。这一发现对OpenAI等人工智能巨头的模型安全构成了潜在威胁。

据Anthropic介绍，这种攻击方式利用“长上下文”来逐渐削弱模型的防御机制。攻击者通过先向LLM连续提出数十个相对无害的问题，逐步诱导模型放松警惕，随后再提出更具危害性的问题，如获取敏感信息或制造危险物品的指示。研究发现，当问题数量增加时，模型可能会在第一百个问题上放弃防御，提供原本应被屏蔽的答案。

Anthropic的内部模型Claude以及其他公司发布的模型在实验中均未能幸免，显示出这一安全问题的普遍性。这一研究结果为人工智能安全领域敲响了警钟，提醒开发者需要对LLM的防护机制进行重新评估和强化，以防止恶意利用。

Anthropic的这项研究再次凸显了人工智能在快速发展的同时，其安全性和伦理问题不容忽视。随着大模型在社会各个领域的广泛应用，确保它们的稳定和安全已经成为人工智能研究和产业发展的重要课题。

英语如下：

**News Title:** “Major Discovery: Long Context Exposes Security Vulnerabilities in Large Language Models, ‘Many-shot Jailbreaking’ Breaks AI Defenses”

**Keywords:** LLM security flaws, many-shot jailbreaking, Anthropic research

**News Content:**

**Anthropic Research Uncovers How “Long Context” May Compromise Large Language Models’ Security: Many-shot Jailbreaking Introduces New Risks**

In the early hours of today, renowned AI research organization Anthropic released a groundbreaking paper revealing a significant security vulnerability in large language models (LLMs). The study discovered a tactic called “many-shot jailbreaking,” which can bypass the safety precautions developers have in place to protect LLMs, posing a potential threat to AI giants like OpenAI.

According to Anthropic, this attack exploits “long context” to gradually weaken the model’s defenses. An attacker would initially feed the LLM a series of seemingly harmless questions, lulling it into a false sense of security. Subsequently, they would pose more malicious queries, such as requests for sensitive information or instructions for creating dangerous items. The research found that as the number of questions increases, the model might surrender its defenses around the第一百 question, divulging information that should have been concealed.

Anthropic’s in-house model, Claude, as well as models released by other companies, were not immune in the experiments, indicating the widespread nature of this security issue. The findings sound an alarm in the AI security domain, urging developers to reevaluate and reinforce LLM protection mechanisms to prevent malicious exploitation.

Anthropic’s research underscores the importance of addressing the safety and ethical concerns that accompany the rapid development of AI. With large models being increasingly deployed across various sectors, ensuring their stability and security has become a crucial topic in AI research and industry development.

【来源】https://mp.weixin.qq.com/s/cC2v10EKRrJeak-L_G4eag