• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。

Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.

机构信息

College of Medicine and Public Health, Flinders University, Adelaide, SA, 5042, Australia.

Advanced Cancer Research Group, Kirkland, WA, USA.

出版信息

BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.

DOI:10.1136/bmj-2023-078538
PMID:38508682
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10961718/
Abstract

OBJECTIVES

To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities.

DESIGN

Repeated cross sectional analysis.

SETTING

Publicly accessible LLMs.

METHODS

In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI's GPT-4 (via ChatGPT and Microsoft's Copilot), Google's PaLM 2 and newly released Gemini Pro (via Bard), Anthropic's Claude 2 (via Poe), and Meta's Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards.

MAIN OUTCOME MEASURES

The main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation.

RESULTS

Claude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts-although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported.

CONCLUSIONS

This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation.

摘要

目的

评估防止大型语言模型(LLM)被滥用以生成健康虚假信息的保障措施的有效性,并评估人工智能(AI)开发者在针对观察到的漏洞采取风险缓解措施方面的透明度。

设计

重复的横截面分析。

设置

公开可访问的 LLM。

方法

在重复的横截面分析中,评估了四个 LLM(通过聊天机器人/助手界面):OpenAI 的 GPT-4(通过 ChatGPT 和 Microsoft 的 Copilot)、Google 的 PaLM 2 和新发布的 Gemini Pro(通过 Bard)、Anthropic 的 Claude 2(通过 Poe)和 Meta 的 Llama 2(通过 HuggingChat)。在 2023 年 9 月,这些 LLM 被提示生成关于两个主题的健康虚假信息:防晒霜是皮肤癌的原因和碱性饮食是癌症的治疗方法。如果需要,评估了越狱技术(即,试图绕过保障措施)。对于观察到保障措施漏洞的 LLM,审核了报告关注输出的流程。在最初调查 12 周后,重新评估了 LLM 的虚假信息生成能力,以评估随后在保障措施方面的任何改进。

主要结果测量

主要结果测量是保障措施是否防止生成健康虚假信息,以及针对健康虚假信息的风险缓解过程的透明度。

结果

Claude 2(通过 Poe)在两次研究时间点拒绝了 130 次请求生成内容的提示,声称防晒霜会导致皮肤癌或碱性饮食是癌症的治疗方法,即使进行了越狱尝试。GPT-4(通过 Copilot)最初拒绝生成健康虚假信息,即使进行了越狱尝试-尽管在 12 周时并非如此。相比之下,GPT-4(通过 ChatGPT)、PaLM 2/Gemini Pro(通过 Bard)和 Llama 2(通过 HuggingChat)始终生成健康虚假信息博客。在 2023 年 9 月的评估中,这些 LLM 生成了 113 篇独特的癌症虚假信息博客,总计超过 40000 字,而无需进行越狱尝试。在整个评估时间内,这些 LLM 的拒绝率仅为 5%(150 次中的 7 次),并且根据提示,LLM 生成的博客采用了引人注目的标题、看起来真实(假的或虚构的)的参考资料、患者和临床医生编造的证词,并且针对不同的人群。尽管评估的每个 LLM 都有机制来报告观察到的关注输出,但当报告观察到的漏洞时,开发人员没有做出回应。

结论

本研究发现,尽管可以采取有效的保障措施来防止 LLM 被滥用以生成健康虚假信息,但这些措施的实施并不一致。此外,缺乏有效的报告保障措施问题的流程。需要加强监管、透明度和例行审计,以帮助防止 LLM 助长大量生成健康虚假信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d3c/10961718/b19d7cbc4897/menb078538.f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d3c/10961718/b19d7cbc4897/menb078538.f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3d3c/10961718/b19d7cbc4897/menb078538.f1.jpg

相似文献

1
Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis.大型语言模型防范生成健康类虚假信息的现行保障措施、风险缓解措施和透明度措施:重复横断面分析。
BMJ. 2024 Mar 20;384:e078538. doi: 10.1136/bmj-2023-078538.
2
Programming Chatbots Using Natural Language: Generating Cervical Spine MRI Impressions.使用自然语言编程聊天机器人:生成颈椎MRI影像报告
Cureus. 2024 Sep 14;16(9):e69410. doi: 10.7759/cureus.69410. eCollection 2024 Sep.
3
Comparing the Performance of Popular Large Language Models on the National Board of Medical Examiners Sample Questions.比较流行的大语言模型在国家医学考试委员会样题上的表现。
Cureus. 2024 Mar 11;16(3):e55991. doi: 10.7759/cureus.55991. eCollection 2024 Mar.
4
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
5
Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.人工智能能回答我的问题吗?腹部整形手术患者围手术期评估中人工智能的应用。
Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.
6
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
7
Clinical Accuracy, Relevance, Clarity, and Emotional Sensitivity of Large Language Models to Surgical Patient Questions: Cross-Sectional Study.大型语言模型针对外科手术患者问题的临床准确性、相关性、清晰度及情感敏感度:横断面研究
JMIR Form Res. 2024 Jun 7;8:e56165. doi: 10.2196/56165.
8
Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.人工智能在麻醉学 board 式考试问题中的应用:大语言模型的作用。
J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1.
9
Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions.人工智能在减重手术中的表现:ChatGPT-4、Bing 和 Bard 在《美国代谢与减重外科学会减重手术教科书》减重手术问题中的比较分析。
Surg Obes Relat Dis. 2024 Jul;20(7):609-613. doi: 10.1016/j.soard.2024.04.014. Epub 2024 May 8.
10
Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat.评估领先的大语言模型在日本国家牙科保健员考试中的功效:ChatGPT、Bard和必应聊天的比较分析。
J Dent Sci. 2024 Oct;19(4):2262-2267. doi: 10.1016/j.jds.2024.02.019. Epub 2024 Feb 29.

引用本文的文献

1
A scoping review of natural language processing in addressing medically inaccurate information: Errors, misinformation, and hallucination.关于自然语言处理在处理医学错误信息方面的范围综述:错误、错误信息和幻觉。
J Biomed Inform. 2025 Jul 22:104866. doi: 10.1016/j.jbi.2025.104866.
2
Large language models in oncology: a review.肿瘤学中的大语言模型:综述
BMJ Oncol. 2025 May 15;4(1):e000759. doi: 10.1136/bmjonc-2025-000759. eCollection 2025.
3
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

本文引用的文献

1
Health Disinformation Use Case Highlighting the Urgent Need for Artificial Intelligence Vigilance: Weapons of Mass Disinformation.健康类虚假信息用例凸显了人工智能监管的迫切需求:大规模虚假信息的武器。
JAMA Intern Med. 2024 Jan 1;184(1):92-96. doi: 10.1001/jamainternmed.2023.5947.
2
Misinformation, Trust, and Use of Ivermectin and Hydroxychloroquine for COVID-19.假信息、信任度与伊维菌素和羟氯喹在 COVID-19 中的应用
JAMA Health Forum. 2023 Sep 1;4(9):e233257. doi: 10.1001/jamahealthforum.2023.3257.
3
The imperative for regulatory oversight of large language models (or generative AI) in healthcare.
评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用:Claude、GPT和Gemini的比较研究
JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.
4
Large Language Models in Spine Surgery: A Promising Technology.脊柱外科中的大语言模型:一项有前景的技术。
HSS J. 2025 May 29:15563316251340696. doi: 10.1177/15563316251340696.
5
Generative AI's healthcare professional role creep: a cross-sectional evaluation of publicly accessible, customised health-related GPTs.生成式人工智能在医疗保健专业领域的角色蔓延:对公开可用的定制健康相关生成式预训练变换器模型的横断面评估
Front Public Health. 2025 May 9;13:1584348. doi: 10.3389/fpubh.2025.1584348. eCollection 2025.
6
Performance of Large Language Models (ChatGPT and Gemini Advanced) in Gastrointestinal Pathology and Clinical Review of Applications in Gastroenterology.大语言模型(ChatGPT和Gemini Advanced)在胃肠病理学及胃肠病学应用临床综述中的表现
Cureus. 2025 Apr 2;17(4):e81618. doi: 10.7759/cureus.81618. eCollection 2025 Apr.
7
When Helpfulness Backfires: LLMs and the Risk of Misinformation Due to Sycophantic Behavior.当助人适得其反时:大语言模型与谄媚行为导致错误信息的风险
Res Sq. 2025 Apr 21:rs.3.rs-6206365. doi: 10.21203/rs.3.rs-6206365/v1.
8
Generalization bias in large language model summarization of scientific research.大语言模型对科学研究进行总结时的泛化偏差。
R Soc Open Sci. 2025 Apr 30;12(4):241776. doi: 10.1098/rsos.241776. eCollection 2025 Apr.
9
Policing the Boundary Between Responsible and Irresponsible Placing on the Market of Large Language Model Health Applications.规范大型语言模型健康应用在市场上的负责任与不负责任投放之间的界限。
Mayo Clin Proc Digit Health. 2025 Jan 21;3(1):100196. doi: 10.1016/j.mcpdig.2025.100196. eCollection 2025 Mar.
10
Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice.用于产科实践中可靠且准确解读胎心监护(CTG)的人工智能大语言模型(AI-LLMs)。
Comput Struct Biotechnol J. 2025 Mar 18;27:1140-1147. doi: 10.1016/j.csbj.2025.03.026. eCollection 2025.
对医疗保健领域的大语言模型(或生成式人工智能)进行监管监督的必要性。
NPJ Digit Med. 2023 Jul 6;6(1):120. doi: 10.1038/s41746-023-00873-0.
4
AI model GPT-3 (dis)informs us better than humans.人工智能模型 GPT-3 比人类更能提供信息。
Sci Adv. 2023 Jun 28;9(26):eadh1850. doi: 10.1126/sciadv.adh1850.
5
Artificial Intelligence Can Generate Fraudulent but Authentic-Looking Scientific Medical Articles: Pandora's Box Has Been Opened.人工智能可以生成虚假但看起来真实的科学医学文章:潘多拉的盒子已经被打开。
J Med Internet Res. 2023 May 31;25:e46924. doi: 10.2196/46924.
6
ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起:公共卫生领域新的 AI 驱动的信息疫情威胁。
Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.
7
Estimated preventable COVID-19-associated deaths due to non-vaccination in the United States.美国因未接种疫苗而导致的可预防 COVID-19 相关死亡人数估计。
Eur J Epidemiol. 2023 Nov;38(11):1125-1128. doi: 10.1007/s10654-023-01006-3. Epub 2023 Apr 24.
8
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.GPT-4作为医学人工智能聊天机器人的益处、局限性和风险
N Engl J Med. 2023 Mar 30;388(13):1233-1239. doi: 10.1056/NEJMsr2214184.
9
AI-Generated Medical Advice-GPT and Beyond.人工智能生成的医学建议——GPT及其他。
JAMA. 2023 Apr 25;329(16):1349-1350. doi: 10.1001/jama.2023.5321.
10
Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine.注意力并非全部所需:在医疗保健和医学中使用大型语言模型所涉及的复杂伦理问题。
EBioMedicine. 2023 Apr;90:104512. doi: 10.1016/j.ebiom.2023.104512. Epub 2023 Mar 15.