Loftus Tyler J, Shickel Benjamin, Ruppert Matthew M, Balch Jeremy A, Ozrazgat-Baslanti Tezcan, Tighe Patrick J, Efron Philip A, Hogan William R, Rashidi Parisa, Upchurch Gilbert R, Bihorac Azra
Department of Surgery, University of Florida Health, Gainesville, Florida, United States of America.
Intelligent Critical Care Center, University of Florida, Gainesville, Florida, United States of America.
PLOS Digit Health. 2022;1(8). doi: 10.1371/journal.pdig.0000085. Epub 2022 Aug 10.
Mistrust is a major barrier to implementing deep learning in healthcare settings. Entrustment could be earned by conveying model certainty, or the probability that a given model output is accurate, but the use of uncertainty estimation for deep learning entrustment is largely unexplored, and there is no consensus regarding optimal methods for quantifying uncertainty. Our purpose is to critically evaluate methods for quantifying uncertainty in deep learning for healthcare applications and propose a conceptual framework for specifying certainty of deep learning predictions. We searched Embase, MEDLINE, and PubMed databases for articles relevant to study objectives, complying with PRISMA guidelines, rated study quality using validated tools, and extracted data according to modified CHARMS criteria. Among 30 included studies, 24 described medical imaging applications. All imaging model architectures used convolutional neural networks or a variation thereof. The predominant method for quantifying uncertainty was Monte Carlo dropout, producing predictions from multiple networks for which different neurons have dropped out and measuring variance across the distribution of resulting predictions. Conformal prediction offered similar strong performance in estimating uncertainty, along with ease of interpretation and application not only to deep learning but also to other machine learning approaches. Among the six articles describing non-imaging applications, model architectures and uncertainty estimation methods were heterogeneous, but predictive performance was generally strong, and uncertainty estimation was effective in comparing modeling methods. Overall, the use of model learning curves to quantify epistemic uncertainty (attributable to model parameters) was sparse. Heterogeneity in reporting methods precluded the performance of a meta-analysis. Uncertainty estimation methods have the potential to identify rare but important misclassifications made by deep learning models and compare modeling methods, which could build patient and clinician trust in deep learning applications in healthcare. Efficient maturation of this field will require standardized guidelines for reporting performance and uncertainty metrics.
不信任是在医疗环境中实施深度学习的主要障碍。可以通过传达模型确定性或给定模型输出准确的概率来赢得信任,但用于深度学习信任的不确定性估计的使用在很大程度上尚未得到探索,并且在量化不确定性的最佳方法方面没有共识。我们的目的是批判性地评估用于量化医疗应用中深度学习不确定性的方法,并提出一个用于指定深度学习预测确定性的概念框架。我们在Embase、MEDLINE和PubMed数据库中搜索与研究目标相关的文章,遵循PRISMA指南,使用经过验证的工具对研究质量进行评分,并根据修改后的CHARM标准提取数据。在纳入的30项研究中,24项描述了医学成像应用。所有成像模型架构都使用了卷积神经网络或其变体。量化不确定性的主要方法是蒙特卡罗随机失活,从多个不同神经元随机失活的网络生成预测,并测量所得预测分布的方差。共形预测在估计不确定性方面表现出同样强大的性能,并且易于解释和应用,不仅适用于深度学习,也适用于其他机器学习方法。在描述非成像应用的六篇文章中,模型架构和不确定性估计方法各不相同,但预测性能通常很强,并且不确定性估计在比较建模方法方面很有效。总体而言,使用模型学习曲线来量化认知不确定性(归因于模型参数)的情况很少。报告方法的异质性妨碍了荟萃分析的进行。不确定性估计方法有可能识别深度学习模型做出的罕见但重要的错误分类,并比较建模方法,这可以建立患者和临床医生对医疗保健中深度学习应用的信任。该领域的有效成熟将需要用于报告性能和不确定性指标的标准化指南。