Suppr超能文献

隐藏分层导致医学成像机器学习中具有临床意义的失败。

Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging.

作者信息

Oakden-Rayner Luke, Dunnmon Jared, Carneiro Gustavo, Ré Christopher

机构信息

Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia.

Department of Computer Science, Stanford University, Stanford, California, USA.

出版信息

Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:151-159. doi: 10.1145/3368555.3384468.

Abstract

Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model may still consistently miss a rare but aggressive cancer subtype. We refer to this problem as , and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring hidden stratification effects, and characterize these effects both via synthetic experiments on the CIFAR-100 benchmark dataset and on multiple real-world medical imaging datasets. Using these measurement techniques, we find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we discuss the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging.

摘要

用于医学图像分析的机器学习模型在训练或测试期间未被识别的人群重要子集中,往往表现不佳。例如,癌症检测模型的整体性能可能很高,但该模型仍可能持续漏诊一种罕见但侵袭性强的癌症亚型。我们将这个问题称为 ,并观察到它是由对数据集中有意义的变异描述不完整导致的。虽然隐藏分层会大幅降低机器学习模型的临床疗效,但其影响仍难以衡量。在这项工作中,我们评估了几种用于测量隐藏分层效应的可能技术的效用,并通过在CIFAR - 100基准数据集上的合成实验以及在多个真实世界医学成像数据集上,对这些效应进行了表征。使用这些测量技术,我们发现有证据表明,隐藏分层可能出现在患病率低、标签质量低、具有细微区分特征或虚假相关性的未识别成像子集中,并且它可能导致在临床重要子集中的相对性能差异超过20%。最后,我们讨论了我们研究结果的临床意义,并建议对隐藏分层的评估应成为医学成像中任何机器学习部署的关键组成部分。

相似文献

引用本文的文献

本文引用的文献

1
Cross-Modal Data Programming Enables Rapid Medical Machine Learning.跨模态数据编程助力快速医学机器学习。
Patterns (N Y). 2020 May 8;1(2). doi: 10.1016/j.patter.2020.100019. Epub 2020 Apr 28.
4
Exploring Large-scale Public Medical Image Datasets.探索大规模公共医学图像数据集。
Acad Radiol. 2020 Jan;27(1):106-112. doi: 10.1016/j.acra.2019.10.006. Epub 2019 Nov 6.
10
A guide to deep learning in healthcare.深度学习在医疗保健中的应用指南。
Nat Med. 2019 Jan;25(1):24-29. doi: 10.1038/s41591-018-0316-z. Epub 2019 Jan 7.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验