Diagnostic Imaging Analysis Group, Medical Imaging Department, Radboud University Medical Center, Geert Grooteplein Zuid 10, 6525 GA, Nijmegen, the Netherlands.
Department of Medicine, Section of Pulmonary Medicine, Herlev-Gentofte Hospital, Hellerup, Denmark.
Eur Radiol. 2024 Oct;34(10):6639-6651. doi: 10.1007/s00330-024-10714-7. Epub 2024 Mar 27.
To investigate the effect of uncertainty estimation on the performance of a Deep Learning (DL) algorithm for estimating malignancy risk of pulmonary nodules.
In this retrospective study, we integrated an uncertainty estimation method into a previously developed DL algorithm for nodule malignancy risk estimation. Uncertainty thresholds were developed using CT data from the Danish Lung Cancer Screening Trial (DLCST), containing 883 nodules (65 malignant) collected between 2004 and 2010. We used thresholds on the 90th and 95th percentiles of the uncertainty score distribution to categorize nodules into certain and uncertain groups. External validation was performed on clinical CT data from a tertiary academic center containing 374 nodules (207 malignant) collected between 2004 and 2012. DL performance was measured using area under the ROC curve (AUC) for the full set of nodules, for the certain cases and for the uncertain cases. Additionally, nodule characteristics were compared to identify trends for inducing uncertainty.
The DL algorithm performed significantly worse in the uncertain group compared to the certain group of DLCST (AUC 0.62 (95% CI: 0.49, 0.76) vs 0.93 (95% CI: 0.88, 0.97); p < .001) and the clinical dataset (AUC 0.62 (95% CI: 0.50, 0.73) vs 0.90 (95% CI: 0.86, 0.94); p < .001). The uncertain group included larger benign nodules as well as more part-solid and non-solid nodules than the certain group.
The integrated uncertainty estimation showed excellent performance for identifying uncertain cases in which the DL-based nodule malignancy risk estimation algorithm had significantly worse performance.
Deep Learning algorithms often lack the ability to gauge and communicate uncertainty. For safe clinical implementation, uncertainty estimation is of pivotal importance to identify cases where the deep learning algorithm harbors doubt in its prediction.
• Deep learning (DL) algorithms often lack uncertainty estimation, which potentially reduce the risk of errors and improve safety during clinical adoption of the DL algorithm. • Uncertainty estimation identifies pulmonary nodules in which the discriminative performance of the DL algorithm is significantly worse. • Uncertainty estimation can further enhance the benefits of the DL algorithm and improve its safety and trustworthiness.
探究不确定性估计对用于估计肺结节恶性风险的深度学习(DL)算法性能的影响。
在这项回顾性研究中,我们将不确定性估计方法集成到之前开发的用于结节恶性风险估计的 DL 算法中。不确定性阈值是使用来自丹麦肺癌筛查试验(DLCST)的 CT 数据开发的,该试验于 2004 年至 2010 年间收集了 883 个结节(65 个恶性)。我们使用不确定性得分分布的第 90 和 95 百分位的阈值将结节分为确定和不确定组。外部验证是在 2004 年至 2012 年间在一个三级学术中心的临床 CT 数据上进行的,共包含 374 个结节(207 个恶性)。使用整个结节、确定病例和不确定病例的受试者工作特征曲线(ROC)下面积(AUC)来衡量 DL 性能。此外,还比较了结节特征,以确定导致不确定性的趋势。
与 DLCST 的确定组相比,DL 算法在不确定组中的表现明显更差(AUC 0.62(95%CI:0.49,0.76)vs 0.93(95%CI:0.88,0.97);p <.001)和临床数据集(AUC 0.62(95%CI:0.50,0.73)vs 0.90(95%CI:0.86,0.94);p <.001)。不确定组中良性结节较大,部分实性和非实性结节也较确定组多。
集成的不确定性估计对于识别不确定病例表现出色,在这些病例中,基于 DL 的结节恶性风险估计算法的性能明显更差。
深度学习算法通常缺乏评估和交流不确定性的能力。为了安全地临床应用,不确定性估计对于识别深度学习算法对其预测存在怀疑的病例至关重要。
深度学习(DL)算法通常缺乏不确定性估计,这可能会降低错误风险,并提高 DL 算法在临床应用中的安全性。
不确定性估计可识别出 DL 算法的判别性能明显更差的肺结节。
不确定性估计可以进一步提高 DL 算法的效益,并提高其安全性和可信度。