Computational Science Initiative, Brookhaven National Laboratory, Upton, NY, USA.
School of Computer Science, The University of Oklahoma, Norman, OK, USA.
Sci Rep. 2022 Oct 24;12(1):17821. doi: 10.1038/s41598-022-22118-y.
In recent years, data-driven, deep-learning-based models have shown great promise in medical risk prediction. By utilizing the large-scale Electronic Health Record data found in the U.S. Department of Veterans Affairs, the largest integrated healthcare system in the United States, we have developed an automated, personalized risk prediction model to support the clinical decision-making process for localized prostate cancer patients. This method combines the representative power of deep learning and the analytical interpretability of parametric regression models and can implement both time-dependent and static input data. To collect a comprehensive evaluation of model performances, we calculate time-dependent C-statistics [Formula: see text] over 2-, 5-, and 10-year time horizons using either a composite outcome or prostate cancer mortality as the target event. The composite outcome combines the Prostate-Specific Antigen (PSA) test, metastasis, and prostate cancer mortality. Our longitudinal model Recurrent Deep Survival Machine (RDSM) achieved [Formula: see text] 0.85 (0.83), 0.80 (0.83), and 0.76 (0.81), while the cross-sectional model Deep Survival Machine (DSM) attained [Formula: see text] 0.85 (0.82), 0.80 (0.82), and 0.76 (0.79) for the 2-, 5-, and 10-year composite (mortality) outcomes, respectively. In addition to estimating the survival probability, our method can quantify the uncertainty associated with the prediction. The uncertainty scores show a consistent correlation with the prediction accuracy. We find PSA and prostate cancer stage information are the most important indicators in risk prediction. Our work demonstrates the utility of the data-driven machine learning model in prostate cancer risk prediction, which can play a critical role in the clinical decision system.
近年来,基于数据驱动、深度学习的模型在医学风险预测方面展现出巨大的潜力。我们利用美国退伍军人事务部(VA)发现的美国最大的综合性医疗保健系统中的大规模电子健康记录(EHR)数据,开发了一种自动化、个性化的风险预测模型,以支持局部前列腺癌患者的临床决策过程。该方法结合了深度学习的代表性和参数回归模型的分析可解释性,可以实现时变和静态输入数据。为了全面评估模型性能,我们使用复合结果或前列腺癌死亡率作为目标事件,计算了 2 年、5 年和 10 年时间范围内的时变 C 统计量 [公式:见文本]。复合结果将前列腺特异性抗原(PSA)测试、转移和前列腺癌死亡率结合在一起。我们的纵向模型 Recurrent Deep Survival Machine (RDSM) 实现了 [公式:见文本] 0.85(0.83)、0.80(0.83)和 0.76(0.81),而横向模型 Deep Survival Machine (DSM) 则分别实现了 [公式:见文本] 0.85(0.82)、0.80(0.82)和 0.76(0.79),用于 2 年、5 年和 10 年复合(死亡)结果。除了估计生存概率外,我们的方法还可以量化预测的不确定性。不确定性评分与预测准确性呈一致的相关性。我们发现 PSA 和前列腺癌分期信息是风险预测中最重要的指标。我们的工作展示了数据驱动机器学习模型在前列腺癌风险预测中的实用性,这在临床决策系统中可以发挥关键作用。