• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

胸部X光病理检测神经原型树中性能、可解释性和公平性的交叉:算法开发与验证研究

Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study.

作者信息

Chen Hongbo, Alfred Myrtede, Brown Andrew D, Atinga Angela, Cohen Eldan

机构信息

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada.

St Michael's Hospital, Toronto, ON, Canada.

出版信息

JMIR Form Res. 2024 Dec 5;8:e59045. doi: 10.2196/59045.

DOI:10.2196/59045
PMID:39636692
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11659703/
Abstract

BACKGROUND

While deep learning classifiers have shown remarkable results in detecting chest X-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the neural prototype tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection.

OBJECTIVE

This study aimed to investigate the utility of the NPT classifier in 3 dimensions, including performance, interpretability, and fairness, and subsequently examined the complex interaction between these dimensions. We highlight both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings.

METHODS

This study used CXRs from the publicly available Chest X-ray 14, CheXpert, and MIMIC-CXR datasets. We trained 6 separate classifiers for each CXR pathology in all datasets, 1 baseline residual neural network (ResNet)-152, and 5 NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness were measured using the area under the receiver operating characteristic curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity.

RESULTS

The performance of the NPT classifier improved as the IC level increased, surpassing that of ResNet-152 at IC level 15 for the Chest X-ray 14 dataset and IC level 31 for the CheXpert and MIMIC-CXR datasets. The NPT classifier at IC level 1 exhibited the highest degree of unfairness, as indicated by the mean TPR disparity. The magnitude of unfairness, as measured by the mean TPR disparity, was more pronounced in groups differentiated by age (chest X-ray 14 0.112, SD 0.015; CheXpert 0.097, SD 0.010; MIMIC 0.093, SD 0.017) compared to sex (chest X-ray 14 0.054 SD 0.012; CheXpert 0.062, SD 0.008; MIMIC 0.066, SD 0.013). A significant positive relationship between interpretability (ie, IC level) and performance (ie, ROC AUC) was observed across all CXR pathologies (P<.001). Furthermore, linear regression analysis revealed a significant negative relationship between interpretability and fairness (ie, mean TPR disparity) across age and sex subgroups (P<.001).

CONCLUSIONS

By illuminating the intricate relationship between performance, interpretability, and fairness of the NPT classifier, this research offers insightful perspectives that could guide future developments in effective, interpretable, and equitable deep learning classifiers for CXR pathology detection.

摘要

背景

虽然深度学习分类器在检测胸部X光(CXR)病变方面已显示出显著成果,但其在临床环境中的应用常常因缺乏透明度而受阻。为了弥补这一差距,本研究引入了神经原型树(NPT),这是一种可解释的图像分类器,它结合了深度学习模型的诊断能力和决策树对CXR病变检测的可解释性。

目的

本研究旨在从性能、可解释性和公平性三个维度研究NPT分类器的效用,并随后考察这些维度之间的复杂相互作用。我们强调了NPT分类器的局部和全局解释,并讨论了其在临床环境中的潜在效用。

方法

本研究使用了来自公开可用的胸部X光14、CheXpert和MIMIC - CXR数据集的CXR图像。我们针对所有数据集中的每种CXR病变训练了6个单独的分类器,1个基线残差神经网络(ResNet)- 152,以及5个具有不同可解释性水平的NPT分类器。分别使用接收器操作特征曲线(ROC AUC)下的面积、解释复杂性(IC)和平均真阳性率(TPR)差异来衡量性能、可解释性和公平性。进行线性回归分析以研究IC与ROC AUC之间以及IC与平均TPR差异之间的关系。

结果

随着IC水平的提高,NPT分类器的性能有所提升,在胸部X光14数据集的IC水平为15时以及CheXpert和MIMIC - CXR数据集的IC水平为31时超过了ResNet - 152的性能。如平均TPR差异所示,IC水平为1时的NPT分类器表现出最高程度的不公平性。按年龄区分的组中,以平均TPR差异衡量的不公平程度比按性别区分的组更明显(胸部X光14 0.112,标准差0.015;CheXpert 0.097,标准差0.010;MIMIC 0.093,标准差0.017)(按性别区分的组:胸部X光14 0.054,标准差0.012;CheXpert 0.062,标准差0.008;MIMIC 0.066,标准差0.013)。在所有CXR病变中均观察到可解释性(即IC水平)与性能(即ROC AUC)之间存在显著的正相关关系(P <.001)。此外,线性回归分析显示,在年龄和性别亚组中,可解释性与公平性(即平均TPR差异)之间存在显著的负相关关系(P <.001)。

结论

通过阐明NPT分类器的性能、可解释性和公平性之间的复杂关系,本研究提供了有见地的观点,可为未来开发用于CXR病变检测的有效、可解释和公平的深度学习分类器提供指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/2529adbfc5fa/formative_v8i1e59045_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/7f5714fc7548/formative_v8i1e59045_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/fb03436b81f2/formative_v8i1e59045_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/ebcf1fd5bfa1/formative_v8i1e59045_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/e3b25e3a6469/formative_v8i1e59045_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/d3aca2edd6f4/formative_v8i1e59045_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/34351c0cf298/formative_v8i1e59045_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/060c16b7f412/formative_v8i1e59045_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/7175ca5c8787/formative_v8i1e59045_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/38d2d811fd29/formative_v8i1e59045_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/2529adbfc5fa/formative_v8i1e59045_fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/7f5714fc7548/formative_v8i1e59045_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/fb03436b81f2/formative_v8i1e59045_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/ebcf1fd5bfa1/formative_v8i1e59045_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/e3b25e3a6469/formative_v8i1e59045_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/d3aca2edd6f4/formative_v8i1e59045_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/34351c0cf298/formative_v8i1e59045_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/060c16b7f412/formative_v8i1e59045_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/7175ca5c8787/formative_v8i1e59045_fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/38d2d811fd29/formative_v8i1e59045_fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed05/11659703/2529adbfc5fa/formative_v8i1e59045_fig10.jpg

相似文献

1
Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study.胸部X光病理检测神经原型树中性能、可解释性和公平性的交叉:算法开发与验证研究
JMIR Form Res. 2024 Dec 5;8:e59045. doi: 10.2196/59045.
2
CheXclusion: Fairness gaps in deep chest X-ray classifiers.CheXclusion:深度学习胸部 X 射线分类器中的公平性差距。
Pac Symp Biocomput. 2021;26:232-243.
3
BarlowTwins-CXR: enhancing chest X-ray abnormality localization in heterogeneous data with cross-domain self-supervised learning.BarlowTwins-CXR:利用跨域自监督学习增强异质数据中胸部 X 光异常定位
BMC Med Inform Decis Mak. 2024 May 16;24(1):126. doi: 10.1186/s12911-024-02529-9.
4
Deep learning prediction of sex on chest radiographs: a potential contributor to biased algorithms.深度学习预测胸部 X 光片上的性别:导致算法产生偏差的潜在因素。
Emerg Radiol. 2022 Apr;29(2):365-370. doi: 10.1007/s10140-022-02019-3. Epub 2022 Jan 10.
5
Performance of a deep-learning algorithm for referable thoracic abnormalities on chest radiographs: A multicenter study of a health screening cohort.深度学习算法在胸部 X 线片上对可转诊的胸部异常的性能:一项健康筛查队列的多中心研究。
PLoS One. 2021 Feb 19;16(2):e0246472. doi: 10.1371/journal.pone.0246472. eCollection 2021.
6
Deep Learning Method for Automated Classification of Anteroposterior and Posteroanterior Chest Radiographs.深度学习方法在前后位和后前位胸部 X 线片中的自动分类。
J Digit Imaging. 2019 Dec;32(6):925-930. doi: 10.1007/s10278-019-00208-0.
7
A deep learning-based algorithm for pulmonary tuberculosis detection in chest radiography.基于深度学习的胸部 X 射线肺结核检测算法。
Sci Rep. 2024 Jun 28;14(1):14917. doi: 10.1038/s41598-024-65703-z.
8
Automatic Localization and Identification of Thoracic Diseases from Chest X-rays with Deep Learning.基于深度学习的胸部X光片上胸部疾病自动定位与识别
Curr Med Imaging. 2022;18(13):1416-1425. doi: 10.2174/1573405618666220518110113.
9
German CheXpert Chest X-ray Radiology Report Labeler.德国 CheXpert 胸部 X 射线放射学报告标签生成器。
Rofo. 2024 Sep;196(9):956-965. doi: 10.1055/a-2234-8268. Epub 2024 Jan 31.
10
Synthetically enhanced: unveiling synthetic data's potential in medical imaging research.合成增强:揭示合成数据在医学成像研究中的潜力。
EBioMedicine. 2024 Jun;104:105174. doi: 10.1016/j.ebiom.2024.105174. Epub 2024 May 30.

引用本文的文献

1
Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: Comparative Analysis and Validation Study.通过上下文学习在电子健康记录中高效检测污名化语言:比较分析与验证研究
JMIR Med Inform. 2025 Aug 18;13:e68955. doi: 10.2196/68955.

本文引用的文献

1
The limits of fair medical imaging AI in real-world generalization.公平的医学影像 AI 在现实世界泛化中的局限性。
Nat Med. 2024 Oct;30(10):2838-2848. doi: 10.1038/s41591-024-03113-4. Epub 2024 Jun 28.
2
An Assessment of How Clinicians and Staff Members Use a Diabetes Artificial Intelligence Prediction Tool: Mixed Methods Study.临床医生和工作人员如何使用糖尿病人工智能预测工具的评估:混合方法研究
JMIR AI. 2023 May 29;2:e45032. doi: 10.2196/45032.
3
The Impact of Expectation Management and Model Transparency on Radiologists' Trust and Utilization of AI Recommendations for Lung Nodule Assessment on Computed Tomography: Simulated Use Study.
期望管理和模型透明度对放射科医生在计算机断层扫描中对肺结节评估的人工智能建议的信任和使用的影响:模拟使用研究
JMIR AI. 2024 Mar 13;3:e52211. doi: 10.2196/52211.
4
Physicians' and Machine Learning Researchers' Perspectives on Ethical Issues in the Early Development of Clinical Machine Learning Tools: Qualitative Interview Study.医生和机器学习研究人员对临床机器学习工具早期开发中伦理问题的看法:定性访谈研究
JMIR AI. 2023 Oct 30;2:e47449. doi: 10.2196/47449.
5
Generative models improve fairness of medical classifiers under distribution shifts.生成式模型可提高分布偏移下医学分类器的公平性。
Nat Med. 2024 Apr;30(4):1166-1173. doi: 10.1038/s41591-024-02838-6. Epub 2024 Apr 10.
6
Enhancing diagnostic deep learning via self-supervised pretraining on large-scale, unlabeled non-medical images.通过在大规模无标签非医学图像上进行自监督预训练来增强诊断深度学习。
Eur Radiol Exp. 2024 Feb 8;8(1):10. doi: 10.1186/s41747-023-00411-3.
7
A Machine Learning Approach with Human-AI Collaboration for Automated Classification of Patient Safety Event Reports: Algorithm Development and Validation Study.一种人机协作的机器学习方法用于患者安全事件报告的自动分类:算法开发与验证研究
JMIR Hum Factors. 2024 Jan 25;11:e53378. doi: 10.2196/53378.
8
Trust in and Acceptance of Artificial Intelligence Applications in Medicine: Mixed Methods Study.对人工智能在医学中的应用的信任和接受:混合方法研究。
JMIR Hum Factors. 2024 Jan 17;11:e47031. doi: 10.2196/47031.
9
Improving explainable AI with patch perturbation-based evaluation pipeline: a COVID-19 X-ray image analysis case study.基于补丁扰动的评估管道提高可解释人工智能:COVID-19 射线图像分析案例研究。
Sci Rep. 2023 Nov 9;13(1):19488. doi: 10.1038/s41598-023-46493-2.
10
Considerations for addressing bias in artificial intelligence for health equity.解决人工智能中影响健康公平性的偏差的考量因素。
NPJ Digit Med. 2023 Sep 12;6(1):170. doi: 10.1038/s41746-023-00913-9.