Kim Su Il, Kang Jeong Wook, Eun Young-Gyu, Lee Young Chan
Department of Otolaryngology-Head and Neck Surgery, Kyung Hee University School of Medicine, Seoul, South Korea.
Front Oncol. 2022 Aug 22;12:974678. doi: 10.3389/fonc.2022.974678. eCollection 2022.
We determined appropriate survival prediction machine learning models for patients with oropharyngeal squamous cell carcinoma (OPSCC) using the "Surveillance, Epidemiology, and End Results" (SEER) database.
In total, 4039 patients diagnosed with OPSCC between 2004 and 2016 were enrolled in this study. In particular, 13 variables were selected and analyzed: age, sex, tumor grade, tumor size, neck dissection, radiation therapy, cancer directed surgery, chemotherapy, T stage, N stage, M stage, clinical stage, and human papillomavirus (HPV) status. The T-, N-, and clinical staging were reconstructed based on the American Joint Committee on Cancer (AJCC) Staging Manual, 8th Edition. The patients were randomly assigned to a development or test dataset at a 7:3 ratio. The extremely randomized survival tree (EST), conditional survival forest (CSF), and DeepSurv models were used to predict the overall and disease-specific survival in patients with OPSCC. A 10-fold cross-validation on a development dataset was used to build the training and internal validation data for all models. We evaluated the predictive performance of each model using test datasets.
A higher c-index value and lower integrated Brier score (IBS), root mean square error (RMSE), and mean absolute error (MAE) indicate a better performance from a machine learning model. The C-index was the highest for the DeepSurv model (0.77). The IBS was also the lowest in the DeepSurv model (0.08). However, the RMSE and RAE were the lowest for the CSF model.
We demonstrated various machine-learning-based survival prediction models. The CSF model showed a better performance in predicting the survival of patients with OPSCC in terms of the RMSE and RAE. In this context, machine learning models based on personalized survival predictions can be used to stratify various complex risk factors. This could help in designing personalized treatments and predicting prognoses for patients.
我们使用“监测、流行病学和最终结果”(SEER)数据库为口咽鳞状细胞癌(OPSCC)患者确定了合适的生存预测机器学习模型。
本研究共纳入2004年至2016年间诊断为OPSCC的4039例患者。特别选取并分析了13个变量:年龄、性别、肿瘤分级、肿瘤大小、颈部清扫术、放射治疗、癌症定向手术、化疗、T分期、N分期、M分期、临床分期和人乳头瘤病毒(HPV)状态。T分期、N分期和临床分期根据美国癌症联合委员会(AJCC)第8版分期手册进行重建。患者以7:3的比例随机分配到开发或测试数据集。使用极端随机生存树(EST)、条件生存森林(CSF)和DeepSurv模型预测OPSCC患者的总生存期和疾病特异性生存期。在开发数据集上进行10倍交叉验证,为所有模型构建训练和内部验证数据。我们使用测试数据集评估每个模型的预测性能。
更高的c指数值和更低的综合Brier评分(IBS)、均方根误差(RMSE)和平均绝对误差(MAE)表明机器学习模型的性能更好。DeepSurv模型的C指数最高(0.77)。DeepSurv模型的IBS也最低(0.08)。然而,CSF模型的RMSE和RAE最低。
我们展示了各种基于机器学习的生存预测模型。CSF模型在预测OPSCC患者生存方面,在RMSE和RAE方面表现更好。在这种情况下,基于个性化生存预测的机器学习模型可用于对各种复杂风险因素进行分层。这有助于为患者设计个性化治疗方案并预测预后。