Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA.
Department of Medical Oncology, City of Hope Medical Center, Duarte, CA.
JCO Clin Cancer Inform. 2020 Jul;4:637-646. doi: 10.1200/CCI.20.00002.
Shapley additive explanation (SHAP) values represent a unified approach to interpreting predictions made by complex machine learning (ML) models, with superior consistency and accuracy compared with prior methods. We describe a novel application of SHAP values to the prediction of mortality risk in prostate cancer.
Patients with nonmetastatic, node-negative prostate cancer, diagnosed between 2004 and 2015, were identified using the National Cancer Database. Model features were specified a priori: age, prostate-specific antigen (PSA), Gleason score, percent positive cores (PPC), comorbidity score, and clinical T stage. We trained a gradient-boosted tree model and applied SHAP values to model predictions. Open-source libraries in Python 3.7 were used for all analyses.
We identified 372,808 patients meeting the inclusion criteria. When analyzing the interaction between PSA and Gleason score, we demonstrated consistency with the literature using the example of low-PSA, high-Gleason prostate cancer, recently identified as a unique entity with a poor prognosis. When analyzing the PPC-Gleason score interaction, we identified a novel finding of stronger interaction effects in patients with Gleason ≥ 8 disease compared with Gleason 6-7 disease, particularly with PPC ≥ 50%. Subsequent confirmatory linear analyses supported this finding: 5-year overall survival in Gleason ≥ 8 patients was 87.7% with PPC < 50% versus 77.2% with PPC ≥ 50% ( < .001), compared with 89.1% versus 86.0% in Gleason 7 patients ( < .001), with a significant interaction term between PPC ≥ 50% and Gleason ≥ 8 ( < .001).
We describe a novel application of SHAP values for modeling and visualizing nonlinear interaction effects in prostate cancer. This ML-based approach is a promising technique with the potential to meaningfully improve risk stratification and staging systems.
Shapley 加法解释(SHAP)值代表了一种解释复杂机器学习(ML)模型预测的统一方法,与先前的方法相比,它具有更高的一致性和准确性。我们描述了 SHAP 值在前列腺癌死亡率预测中的一种新应用。
使用国家癌症数据库(National Cancer Database),确定了 2004 年至 2015 年间诊断为非转移性、淋巴结阴性前列腺癌的患者。模型特征事先指定:年龄、前列腺特异性抗原(PSA)、Gleason 评分、阳性核心百分比(PPC)、合并症评分和临床 T 分期。我们训练了一个梯度提升树模型,并将 SHAP 值应用于模型预测。所有分析均使用 Python 3.7 的开源库。
我们确定了符合纳入标准的 372808 名患者。在分析 PSA 和 Gleason 评分之间的相互作用时,我们用低 PSA、高 Gleason 前列腺癌的例子证明了与文献的一致性,这种癌症最近被确定为一种具有不良预后的独特实体。在分析 PPC-Gleason 评分相互作用时,我们发现了一个新的发现,即在 Gleason≥8 疾病患者中,与 Gleason 6-7 疾病患者相比,交互作用更强,尤其是在 PPC≥50%的情况下。随后的确认性线性分析支持了这一发现:在 Gleason≥8 患者中,PPC<50%的 5 年总生存率为 87.7%,而 PPC≥50%的生存率为 77.2%(<0.001),与 Gleason 7 患者的 89.1%与 86.0%相比(<0.001),PPC≥50%与 Gleason≥8 之间存在显著的交互项(<0.001)。
我们描述了 SHAP 值在前列腺癌建模和可视化非线性交互作用中的一种新应用。这种基于机器学习的方法是一种很有前途的技术,有可能显著改善风险分层和分期系统。