Zhang Cheng, Xie Minmin, Zhang Yi, Zhang Xiaopeng, Feng Chong, Wu Zhijun, Feng Ying, Yang Yahui, Xu Hui, Ma Tai
Department of Oncology, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, People's Republic of China.
Anhui Provincial Cancer Institute/Anhui Provincial Office for Cancer Prevention and Control, Hefei, People's Republic of China.
J Gastric Cancer. 2022 Apr;22(2):120-134. doi: 10.5230/jgc.2022.22.e12.
This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration.
This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features.
The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT.
Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.
ChiCTR Identifier: ChiCTR1800019978.
本研究旨在使用机器学习算法确定远处淋巴结受累胃癌(GC)患者的预后因素,该方法为高维生物医学数据探索提供了显著优势和新前景。
本研究采用了289例GC患者的79项临床病理、实验室检查和治疗细节特征,这些患者的远处淋巴结病表现为首次复发或转移。以任何原因导致的死亡事件和远处淋巴结转移后的生存月数作为结局指标。使用随机生存森林算法基于可能的结局预测因素构建预测模型,并通过5×5嵌套交叉验证进行确认。使用部分依赖图解释单变量的影响。使用等高线图直观地表示基于2个预测特征的生存预测。
远处淋巴结转移的GC患者的中位生存时间为9.2个月。最佳模型纳入了前白蛋白水平和凝血酶原时间(PT),预测误差为0.353。纳入其他变量会导致模型性能变差。血清前白蛋白水平较高或PT较短的患者预后明显较好。基于前白蛋白水平和PT的联合效应,将预测的一年生存率进行分层并以等高线图表示。
机器学习有助于使用高维数据集识别癌症生存的重要决定因素。远处淋巴结转移时的前白蛋白水平和PT是预测晚期GC后续生存时间的2个最关键因素。
中国临床试验注册中心标识符:ChiCTR1800019978。