Gu Zishan, Chen Jiayuan, Liu Fenglin, Yin Changchang, Zhang Ping
Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.
Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, USA.
Adv Intell Syst. 2025 Jul 21. doi: 10.1002/aisy.202500255.
Large vision language models (LVLMs) have achieved superior performance on natural image and text tasks, inspiring extensive fine-tuning research. However, their robustness against hallucination in clinical contexts remains understudied. We propose the Medical Visual Hallucination Test (MedVH), a novel evaluation framework assessing hallucination tendencies in both medical-specific and general-purpose LVLMs. MedVH encompasses six tasks targeting medical hallucinations, including two traditional tasks and four novel tasks formatted as multi-choice visual question answering and long response generation. Our extensive experiments with six evaluation metrics reveal that medical LVLMs, despite promising performance on standard medical tasks, are particularly susceptible to hallucinations-often more so than general models. This raises significant concerns about domain-specific model reliability. For real-world applications, medical LVLMs must accurately integrate medical knowledge while maintaining robust reasoning to prevent hallucination. We explore mitigation methods without model-specific fine-tuning, including prompt engineering and collaboration between general and domain-specific models. Our work provides a foundation for future evaluation studies. The dataset is available at PhysioNet: https://physionet.org/content/medvh.
大型视觉语言模型(LVLMs)在自然图像和文本任务上取得了卓越的性能,这激发了广泛的微调研究。然而,它们在临床环境中抵御幻觉的能力仍未得到充分研究。我们提出了医学视觉幻觉测试(MedVH),这是一个新颖的评估框架,用于评估医学专用和通用LVLMs中的幻觉倾向。MedVH包含六项针对医学幻觉的任务,包括两项传统任务和四项格式为多项选择视觉问答和长响应生成的新任务。我们使用六种评估指标进行的广泛实验表明,医学LVLMs尽管在标准医学任务上表现出色,但特别容易产生幻觉——通常比通用模型更容易产生幻觉。这引发了对特定领域模型可靠性的重大担忧。对于实际应用,医学LVLMs必须在准确整合医学知识的同时保持强大的推理能力,以防止产生幻觉。我们探索了无需特定模型微调的缓解方法,包括提示工程以及通用模型和特定领域模型之间的协作。我们的工作为未来的评估研究奠定了基础。该数据集可在PhysioNet上获取:https://physionet.org/content/medvh 。