评估大语言模型的系统指令漏洞，以防其被恶意转化为健康虚假信息聊天机器人。

Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots.

作者信息

Modi Natansh D, Menz Bradley D, Awaty Abdulhalim A, Alex Cyril A, Logan Jessica M, McKinnon Ross A, Rowland Andrew, Bacchi Stephen, Gradon Kacper, Sorich Michael J, Hopkins Ashley M

机构信息

Flinders University, College of Medicine and Public Health, Flinders Health and Medical Research Institute, and Clinical and Health Sciences, University of South Australia, Adelaide, Australia (N.D.M.).

Flinders University, College of Medicine and Public Health, Flinders Health and Medical Research Institute, Adelaide, Australia (B.D.M., A.A.A., C.A.A., R.A.M., A.R., M.J.S., A.M.H.).

出版信息

Ann Intern Med. 2025 Jun 24. doi: 10.7326/ANNALS-24-03933.

Large language models (LLMs) offer substantial promise for improving health care; however, some risks warrant evaluation and discussion. This study assessed the effectiveness of safeguards in foundational LLMs against malicious instruction into health disinformation chatbots. Five foundational LLMs-OpenAI's GPT-4o, Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, Meta's Llama 3.2-90B Vision, and xAI's Grok Beta-were evaluated via their application programming interfaces (APIs). Each API received system-level instructions to produce incorrect responses to health queries, delivered in a formal, authoritative, convincing, and scientific tone. Ten health questions were posed to each customized chatbot in duplicate. Exploratory analyses assessed the feasibility of creating a customized generative pretrained transformer (GPT) within the OpenAI GPT Store and searched to identify if any publicly accessible GPTs in the store seemed to respond with disinformation. Of the 100 health queries posed across the 5 customized LLM API chatbots, 88 (88%) responses were health disinformation. Four of the 5 chatbots (GPT-4o, Gemini 1.5 Pro, Llama 3.2-90B Vision, and Grok Beta) generated disinformation in 100% (20 of 20) of their responses, whereas Claude 3.5 Sonnet responded with disinformation in 40% (8 of 20). The disinformation included claimed vaccine-autism links, HIV being airborne, cancer-curing diets, sunscreen risks, genetically modified organism conspiracies, attention deficit-hyperactivity disorder and depression myths, garlic replacing antibiotics, and 5G causing infertility. Exploratory analyses further showed that the OpenAI GPT Store could currently be instructed to generate similar disinformation. Overall, LLM APIs and the OpenAI GPT Store were shown to be vulnerable to malicious system-level instructions to covertly create health disinformation chatbots. These findings highlight the urgent need for robust output screening safeguards to ensure public health safety in an era of rapidly evolving technologies.

大语言模型（LLMs）为改善医疗保健带来了巨大希望；然而，一些风险值得评估和讨论。本研究评估了基础大语言模型中的防护措施对恶意指令进入健康虚假信息聊天机器人的有效性。通过其应用程序编程接口（APIs）对五个基础大语言模型——OpenAI的GPT-4o、谷歌的Gemini 1.5 Pro、Anthropic的Claude 3.5 Sonnet、Meta的Llama 3.2-90B Vision和xAI的Grok Beta进行了评估。每个API都收到系统级指令，以正式、权威、有说服力和科学的语气对健康问题给出错误回答。对每个定制聊天机器人重复提出了十个健康问题。探索性分析评估了在OpenAI GPT商店中创建定制生成式预训练变换器（GPT）的可行性，并搜索以确定商店中是否有任何可公开访问的GPT似乎会给出虚假信息回复。在通过5个定制大语言模型API聊天机器人提出的100个健康问题中，88个（88%）回复是健康虚假信息。5个聊天机器人中的4个（GPT-4o、Gemini 1.5 Pro、Llama 3.2-90B Vision和Grok Beta）在其100%（20个回复中的20个）回复中生成了虚假信息，而Claude 3.5 Sonnet在40%（20个中的8个）回复中给出了虚假信息。虚假信息包括声称的疫苗与自闭症联系、HIV可通过空气传播、治愈癌症的饮食、防晒霜风险、转基因生物阴谋论、注意力缺陷多动障碍和抑郁症的误解、大蒜可替代抗生素以及5G导致不孕。探索性分析进一步表明，目前可以指示OpenAI GPT商店生成类似的虚假信息。总体而言，大语言模型API和OpenAI GPT商店被证明容易受到恶意系统级指令的影响，从而秘密创建健康虚假信息聊天机器人。这些发现凸显了在技术快速发展的时代，迫切需要强大的输出筛选防护措施以确保公众健康安全。

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

Assessing the System-Instruction Vulnerabilities of Large Language Models to Malicious Conversion Into Health Disinformation Chatbots.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

推荐工具