Clusmann Jan, Ferber Dyke, Wiest Isabella C, Schneider Carolin V, Brinker Titus J, Foersch Sebastian, Truhn Daniel, Kather Jakob Nikolas
Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany.
Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
Nat Commun. 2025 Feb 1;16(1):1239. doi: 10.1038/s41467-024-55631-x.
Vision-language artificial intelligence models (VLMs) possess medical knowledge and can be employed in healthcare in numerous ways, including as image interpreters, virtual scribes, and general decision support systems. However, here, we demonstrate that current VLMs applied to medical tasks exhibit a fundamental security flaw: they can be compromised by prompt injection attacks. These can be used to output harmful information just by interacting with the VLM, without any access to its parameters. We perform a quantitative study to evaluate the vulnerabilities to these attacks in four state of the art VLMs: Claude-3 Opus, Claude-3.5 Sonnet, Reka Core, and GPT-4o. Using a set of N = 594 attacks, we show that all of these models are susceptible. Specifically, we show that embedding sub-visual prompts in manifold medical imaging data can cause the model to provide harmful output, and that these prompts are non-obvious to human observers. Thus, our study demonstrates a key vulnerability in medical VLMs which should be mitigated before widespread clinical adoption.
视觉语言人工智能模型(VLMs)拥有医学知识,可在医疗保健领域以多种方式应用,包括作为图像解释器、虚拟抄写员和通用决策支持系统。然而,在此我们证明,应用于医疗任务的当前VLMs存在一个基本的安全漏洞:它们可能会受到提示注入攻击的影响。仅通过与VLM交互,无需访问其参数,这些攻击就可用于输出有害信息。我们进行了一项定量研究,以评估四种先进VLMs(Claude-3 Opus、Claude-3.5 Sonnet、Reka Core和GPT-4o)对这些攻击的脆弱性。使用一组N = 594次攻击,我们表明所有这些模型都易受攻击。具体而言,我们表明在多种医学成像数据中嵌入亚视觉提示会导致模型提供有害输出,并且这些提示对人类观察者来说并不明显。因此,我们的研究证明了医学VLMs中的一个关键漏洞,在广泛临床应用之前应予以缓解。