Suppr超能文献

计算病理学模型导致的误诊中的人口统计学偏差。

Demographic bias in misdiagnosis by computational pathology models.

机构信息

Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.

出版信息

Nat Med. 2024 Apr;30(4):1174-1190. doi: 10.1038/s41591-024-02885-z. Epub 2024 Apr 19.

Abstract

Despite increasing numbers of regulatory approvals, deep learning-based computational pathology systems often overlook the impact of demographic factors on performance, potentially leading to biases. This concern is all the more important as computational pathology has leveraged large public datasets that underrepresent certain demographic groups. Using publicly available data from The Cancer Genome Atlas and the EBRAINS brain tumor atlas, as well as internal patient data, we show that whole-slide image classification models display marked performance disparities across different demographic groups when used to subtype breast and lung carcinomas and to predict IDH1 mutations in gliomas. For example, when using common modeling approaches, we observed performance gaps (in area under the receiver operating characteristic curve) between white and Black patients of 3.0% for breast cancer subtyping, 10.9% for lung cancer subtyping and 16.0% for IDH1 mutation prediction in gliomas. We found that richer feature representations obtained from self-supervised vision foundation models reduce performance variations between groups. These representations provide improvements upon weaker models even when those weaker models are combined with state-of-the-art bias mitigation strategies and modeling choices. Nevertheless, self-supervised vision foundation models do not fully eliminate these discrepancies, highlighting the continuing need for bias mitigation efforts in computational pathology. Finally, we demonstrate that our results extend to other demographic factors beyond patient race. Given these findings, we encourage regulatory and policy agencies to integrate demographic-stratified evaluation into their assessment guidelines.

摘要

尽管监管部门的批准越来越多,但基于深度学习的计算病理学系统常常忽略了人口统计学因素对性能的影响,这可能导致偏差。随着计算病理学利用了代表性不足的某些人口统计学群体的大型公共数据集,这种担忧变得更加重要。我们使用来自癌症基因组图谱和欧洲脑研究倡议大脑肿瘤图谱的公开数据以及内部患者数据,展示了全切片图像分类模型在用于乳腺癌和肺癌亚型分类以及预测胶质瘤中 IDH1 突变时,在不同人群中表现出明显的性能差异。例如,当使用常见的建模方法时,我们观察到乳腺癌亚型分类的白人和黑人患者之间的性能差距(接收者操作特征曲线下的面积)为 3.0%,肺癌亚型分类为 10.9%,胶质瘤中 IDH1 突变预测为 16.0%。我们发现,来自自我监督视觉基础模型的更丰富的特征表示减少了群体之间的性能变化。即使在将这些较弱的模型与最先进的偏差缓解策略和建模选择相结合的情况下,这些表示也可以提供改进。然而,自我监督视觉基础模型并没有完全消除这些差异,突出了在计算病理学中继续需要减轻偏差的努力。最后,我们证明我们的结果扩展到了患者种族以外的其他人口统计学因素。鉴于这些发现,我们鼓励监管和政策机构将人口统计学分层评估纳入其评估指南。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验