胸部X光片二元肺炎分类中幻觉的潜在威胁

The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.

作者信息

Rajaraman Sivaramakrishnan, Liang Zhaohui, Marini Niccolo, Xue Zhiyun, Antani Sameer

机构信息

Division of Intramural Research, National Library of Medicine, National Institutes of Health Bethesda, MD, USA.

出版信息

Proc IEEE Int Symp Comput Based Med Syst. 2025 Jun;2025:668-673. doi: 10.1109/cbms65348.2025.00138. Epub 2025 Jul 4.

DOI:10.1109/cbms65348.2025.00138

PMID:40852408

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12369649/

Abstract

Hallucination in deep learning (DL) classification, where DL models yield confidently erroneous predictions remains a pressing concern. This study investigates whether binary classifiers are truly learning disease-specific features when distinguishing overlapping radiological presentations among pneumonia subtypes on chest X-ray (CXR) images. Specifically, we evaluate if uncertainty measure is a valuable tool in classifying signs of different pathogen-specific subtypes of pneumonia. We evaluated two binary classifiers to classify bacterial pneumonia and viral pneumonia, respectively, from normal CXRs. A third classifier explored the ability to distinguish bacterial from viral pneumonia presentation to highlight our concern regarding the observed hallucinations in the former cases. Our comprehensive analysis computes the Matthews Correlation Coefficient and prediction entropy metrics on a pediatric CXR dataset and reveals that the normal/bacterial and normal/viral classifiers consistently and confidently misclassify the unseen pneumonia subtype to their respective disease class. These findings expose a critical limitation concerning the tendency of binary classifiers to hallucinate by relying on general pneumonia indicators rather than pathogen-specific patterns, thereby challenging their utility in clinical workflows.

摘要

深度学习（DL）分类中的幻觉现象，即DL模型产生置信度高但错误的预测，仍然是一个紧迫的问题。本研究调查了二分类器在区分胸部X光（CXR）图像上肺炎亚型之间重叠的放射学表现时，是否真的在学习疾病特异性特征。具体而言，我们评估不确定性度量是否是一种有价值的工具，用于对不同病原体特异性肺炎亚型的体征进行分类。我们评估了两个二分类器，分别从正常的CXR图像中对细菌性肺炎和病毒性肺炎进行分类。第三个分类器探索了区分细菌性肺炎和病毒性肺炎表现的能力，以突出我们对前一种情况下观察到的幻觉现象的担忧。我们的综合分析在一个儿科CXR数据集上计算了马修斯相关系数和预测熵指标，结果显示正常/细菌性和正常/病毒性分类器持续且自信地将未见过的肺炎亚型误分类到各自的疾病类别中。这些发现揭示了一个关键局限性，即二分类器倾向于依靠一般肺炎指标而非病原体特异性模式产生幻觉，从而挑战了它们在临床工作流程中的实用性。

相似文献

The Hidden Threat of Hallucinations in Binary Chest X-ray Pneumonia Classification.胸部X光片二元肺炎分类中幻觉的潜在威胁

Proc IEEE Int Symp Comput Based Med Syst. 2025 Jun;2025:668-673. doi: 10.1109/cbms65348.2025.00138. Epub 2025 Jul 4.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

CXR-MultiTaskNet a unified deep learning framework for joint disease localization and classification in chest radiographs.CXR-MultiTaskNet：一种用于胸部X光片中疾病联合定位与分类的统一深度学习框架。

Sci Rep. 2025 Aug 31;15(1):32022. doi: 10.1038/s41598-025-16669-z.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Differentiation of COVID-19 from other types of viral pneumonia and severity scoring on baseline chest radiographs: Comparison of deep learning with multi-reader evaluation.基于胸部X光片将新型冠状病毒肺炎与其他类型病毒性肺炎进行鉴别及严重程度评分：深度学习与多位阅片者评估的比较

PLoS One. 2025 Jul 29;20(7):e0328061. doi: 10.1371/journal.pone.0328061. eCollection 2025.

Genetic determinants of testicular sperm extraction outcomes: insights from a large multicentre study of men with non-obstructive azoospermia.睾丸精子提取结果的遗传决定因素：来自一项针对非梗阻性无精子症男性的大型多中心研究的见解

Hum Reprod Open. 2025 Aug 29;2025(3):hoaf049. doi: 10.1093/hropen/hoaf049. eCollection 2025.

Thoracic imaging tests for the diagnosis of COVID-19.用于 COVID-19 诊断的胸部影像学检查。

Cochrane Database Syst Rev. 2022 May 16;5(5):CD013639. doi: 10.1002/14651858.CD013639.pub5.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能？开发一种互联网应用算法。

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

[Guidelines for the prevention and management of bronchial asthma (2024 edition)].[支气管哮喘防治指南（2024年版）]

Zhonghua Jie He He Hu Xi Za Zhi. 2025 Mar 12;48(3):208-248. doi: 10.3760/cma.j.cn112147-20241013-00601.

Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用：对临床预测模型的影响。

J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.

本文引用的文献

Ensembled YOLO for multiorgan detection in chest x-rays.用于胸部X光多器官检测的集成YOLO

Proc SPIE Int Soc Opt Eng. 2025 Feb;13407. doi: 10.1117/12.3047210. Epub 2025 Apr 4.

Leveraging compact convolutional transformers for enhanced COVID-19 detection in chest X-rays: a grad-CAM visualization approach.利用紧凑型卷积变压器增强胸部X光片中的COVID-19检测：一种梯度加权类激活映射可视化方法。

Front Big Data. 2024 Dec 16;7:1489020. doi: 10.3389/fdata.2024.1489020. eCollection 2024.

Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review.揭示用于医疗人工智能开发的公开可用 COVID-19 数据集的透明度差距：一项系统评价。

Lancet Digit Health. 2024 Nov;6(11):e827-e847. doi: 10.1016/S2589-7500(24)00146-8.

Uncertainty quantification in multi-class image classification using chest X-ray images of COVID-19 and pneumonia.使用新冠肺炎和肺炎胸部X光图像进行多类别图像分类中的不确定性量化

Front Artif Intell. 2024 Sep 18;7:1410841. doi: 10.3389/frai.2024.1410841. eCollection 2024.

Facing Differences of Similarity: Intra- and Inter-Correlation Unsupervised Learning for Chest X-Ray Anomaly Detection.面对相似性中的差异：用于胸部X光异常检测的内部和相互关联无监督学习

IEEE Trans Med Imaging. 2025 Feb;44(2):801-814. doi: 10.1109/TMI.2024.3461231. Epub 2025 Feb 4.

Deep Learning for Pneumonia Detection in Chest X-ray Images: A Comprehensive Survey.胸部X光图像中肺炎检测的深度学习：全面综述。

J Imaging. 2024 Jul 23;10(8):176. doi: 10.3390/jimaging10080176.

GC: Generalizable Continual Classification of Medical Images.GC：医学图像的可推广连续分类。

IEEE Trans Med Imaging. 2024 Nov;43(11):3767-3779. doi: 10.1109/TMI.2024.3398533. Epub 2024 Nov 4.

Out-of-distribution detection with in-distribution voting using the medical example of chest x-ray classification.使用分布内投票进行分布外检测，以胸部 X 射线分类为例。

Med Phys. 2024 Apr;51(4):2721-2732. doi: 10.1002/mp.16790. Epub 2023 Oct 13.

Epistemic uncertainty in Bayesian predictive probabilities.贝叶斯预测概率中的认知不确定性。

J Biopharm Stat. 2024 May;34(3):394-412. doi: 10.1080/10543406.2023.2204943. Epub 2023 May 8.

COVID-19 Detection: A Systematic Review of Machine and Deep Learning-Based Approaches Utilizing Chest X-Rays and CT Scans.新冠病毒（COVID-19）检测：基于胸部X光和CT扫描的机器学习与深度学习方法的系统综述

Cognit Comput. 2022 Dec 29:1-38. doi: 10.1007/s12559-022-10076-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验