Blattmann Malte, Lindenmeyer Adrian, Franke Stefan, Neumuth Thomas, Schneider Daniel
Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Semmelweisstraße 14, Leipzig, Germany.
PLOS Digit Health. 2025 Jul 29;4(7):e0000801. doi: 10.1371/journal.pdig.0000801. eCollection 2025 Jul.
Deep learning models offer transformative potential for personalized medicine by providing automated, data-driven support for complex clinical decision-making. However, their reliability degrades on out-of-distribution inputs, and traditional point-estimate predictors can give overconfident outputs even in regions where the model has little evidence. This shortcoming highlights the need for decision-support systems that quantify and communicate per-query epistemic (knowledge) uncertainty. Approximate Bayesian deep learning methods address this need by introducing principled uncertainty estimates over the model's function. In this work, we compare three such methods on the task of predicting prostate cancer-specific mortality for treatment planning, using data from the PLCO cancer screening trial. All approaches achieve strong discriminative performance (AUROC = 0.86) and produce well-calibrated probabilities in-distribution, yet they differ markedly in the fidelity of their epistemic uncertainty estimates. We show that implicit functional-prior methods-specifically neural network ensembles and factorized weight prior variational Bayesian neural networks-exhibit reduced fidelity when approximating the posterior distribution and yield systematically biased estimates of epistemic uncertainty. By contrast, models employing explicitly defined, distance-aware priors-such as spectral-normalized neural Gaussian processes (SNGP)-provide more accurate posterior approximations and more reliable uncertainty quantification. These properties make explicitly distance-aware architectures particularly promising for building trustworthy clinical decision-support tools.
深度学习模型通过为复杂的临床决策提供自动化、数据驱动的支持,为个性化医疗带来了变革性的潜力。然而,它们在分布外输入上的可靠性会下降,而且传统的点估计预测器即使在模型证据很少的区域也可能给出过度自信的输出。这一缺点凸显了对能够量化和传达每个查询的认知(知识)不确定性的决策支持系统的需求。近似贝叶斯深度学习方法通过对模型函数引入有原则的不确定性估计来满足这一需求。在这项工作中,我们使用来自PLCO癌症筛查试验的数据,在预测前列腺癌特异性死亡率以进行治疗规划的任务上比较了三种这样的方法。所有方法都取得了很强的判别性能(AUROC = 0.86),并且在分布内产生了校准良好的概率,但它们在认知不确定性估计的保真度上有显著差异。我们表明,隐式函数先验方法——特别是神经网络集成和因式分解权重先验变分贝叶斯神经网络——在近似后验分布时表现出较低的保真度,并产生系统偏差的认知不确定性估计。相比之下,采用明确定义的、距离感知先验的模型——如谱归一化神经高斯过程(SNGP)——提供了更准确的后验近似和更可靠的不确定性量化。这些特性使得明确的距离感知架构在构建可信赖的临床决策支持工具方面特别有前景。