通过数据中毒攻击揭示临床语言模型中的漏洞：乳腺癌案例研究

Exposing Vulnerabilities in Clinical LLMs Through Data Poisoning Attacks: Case Study in Breast Cancer.

作者信息

Das Avisha, Tariq Amara, Batalini Felipe, Dhara Boddhisattwa, Banerjee Imon

机构信息

Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona.

Department of Oncology, Mayo Clinic Arizona.

出版信息

medRxiv. 2024 Mar 21:2024.03.20.24304627. doi: 10.1101/2024.03.20.24304627.

DOI:10.1101/2024.03.20.24304627

PMID:38562849

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10984073/

Abstract

Training Large Language Models (LLMs) with billions of parameters on a dataset and publishing the model for public access is the standard practice currently. Despite their transformative impact on natural language processing, public LLMs present notable vulnerabilities given the source of training data is often web-based or crowdsourced, and hence can be manipulated by perpetrators. We delve into the vulnerabilities of clinical LLMs, particularly BioGPT which is trained on publicly available biomedical literature and clinical notes from MIMIC-III, in the realm of data poisoning attacks. Exploring susceptibility to data poisoning-based attacks on de-identified breast cancer clinical notes, our approach is the first one to assess the extent of such attacks and our findings reveal successful manipulation of LLM outputs. Through this work, we emphasize on the urgency of comprehending these vulnerabilities in LLMs, and encourage the mindful and responsible usage of LLMs in the clinical domain.

摘要

目前的标准做法是在一个数据集上训练具有数十亿参数的大语言模型（LLMs），并发布模型以供公众使用。尽管它们对自然语言处理产生了变革性影响，但由于训练数据的来源通常基于网络或众包，公共大语言模型存在显著漏洞，因此可能被作恶者操纵。我们深入研究临床大语言模型的漏洞，特别是在数据中毒攻击领域中基于公开可用的生物医学文献和MIMIC-III临床笔记训练的BioGPT。我们的方法是第一个评估基于数据中毒攻击对去识别化乳腺癌临床笔记的易感性，探索对这些攻击的敏感性，我们的研究结果揭示了对大语言模型输出的成功操纵。通过这项工作，我们强调理解大语言模型中这些漏洞的紧迫性，并鼓励在临床领域谨慎和负责任地使用大语言模型。