Advanced Computing for Health Sciences Group, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1737-1743. doi: 10.1093/jamia/ocac106.
The predictive modeling literature for biomedical applications is dominated by biostatistical methods for survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation of a machine learning method appropriate for time-to-event modeling in the area of prostate cancer long-term disease progression. Using XGBoost adapted to long-term disease progression, we developed a predictive model for 118 788 patients with localized prostate cancer at diagnosis from the Department of Veterans Affairs (VA). Our model accounted for patient censoring. Harrell's c-index for our model using only features available at the time of diagnosis was 0.757 95% confidence interval [0.756, 0.757]. Our results show that machine learning methods like XGBoost can be adapted to use accelerated failure time (AFT) with censoring to model long-term risk of disease progression. The long median survival justifies and requires censoring. Overall, we show that an existing machine learning approach can be used for AFT outcome modeling in prostate cancer, and more generally for other chronic diseases with long observation times.
生物医学应用的预测建模文献主要由生存分析的生物统计学方法主导,最近也出现了一些新颖的机器学习方法。在本文中,我们展示了一种适用于前列腺癌长期疾病进展领域的时间事件建模的机器学习方法。我们使用 XGBoost 对长期疾病进展进行了调整,为来自退伍军人事务部(VA)的 118788 名局部前列腺癌患者建立了一个预测模型。我们的模型考虑了患者的删失。我们的模型仅使用诊断时可用的特征,Harrell 的 c 指数为 0.757(95%置信区间为 0.756,0.757)。我们的结果表明,像 XGBoost 这样的机器学习方法可以适应使用带有删失的加速失效时间(AFT)来对疾病进展的长期风险进行建模。较长的中位生存期证明了且需要进行删失。总的来说,我们表明现有的机器学习方法可以用于前列腺癌的 AFT 结果建模,更普遍地用于其他具有较长观察时间的慢性疾病。