Lan Jie, Wang Heng, Huang Jing, Li Weiyi, Ao Min, Zhang Wanfeng, Mu Junhao, Yang Li, Ran Longke
Department of Bioinformatics, The Basic Medical School of Chongqing Medical University, Chongqing, China.
Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China.
Clin Transl Sci. 2025 Apr;18(4):e70186. doi: 10.1111/cts.70186.
Given that more than 20% of patients with cT1 solid NSCLC showed nodal or extrathoracic metastasis, early detection of metastasis is crucial and urgent for improving therapeutic planning and patients' risk stratification in clinical practice. This study collected clinicopathological variables from the pulmonary nodule and lung cancer database of the First Affiliated Hospital of Chongqing Medical University, where patients with early-stage (cT1) solitary lung cancer were evaluated from 2018.11 to 2022.10. The random forest model and Shapley Additive Explanations (SHAP) were used to investigate the importance of clinical features in the feature selection part. Random Forest, Gradient Boosting, and AdaBoost classifiers were applied to build the final model, and the predictive discrimination of each model was compared based on the receiver operating characteristics (ROC) curve and precision and recall curve. With the evaluation of feature importance, 9 features were used to construct the prediction model finally. The Random Forest model yielded an average precision of 0.93 with an area under the curve (AUC) of 0.92 (95% CI: 0.88-0.94) compared with the Gradient Boosting and AdaBoost classifiers in the internal validation dataset, yielding an average precision of 0.87 and 0.91 with AUCs of 0.87 (95% CI: 0.84-0.93) and 0.90 (95% CI: 0.86-0.92), respectively. In addition, the Random Forest classifier performed best in 5 other 5 diagnostic indices. Furthermore, we embedded this model in a web application called MoLPre (https://molpre.cqmu.edu.cn/), a user-friendly tool assisting in the metastasis prediction of cT1 solid lung cancer.
鉴于超过20%的cT1期实性非小细胞肺癌患者出现淋巴结或胸外转移,在临床实践中,早期发现转移对于改善治疗规划和患者风险分层至关重要且迫在眉睫。本研究收集了重庆医科大学附属第一医院肺结节与肺癌数据库中的临床病理变量,该数据库对2018年11月至2022年10月期间的早期(cT1)孤立性肺癌患者进行了评估。在特征选择部分,使用随机森林模型和夏普利值(SHAP)来研究临床特征的重要性。应用随机森林、梯度提升和AdaBoost分类器构建最终模型,并基于受试者工作特征(ROC)曲线以及精确率和召回率曲线比较每个模型的预测辨别力。通过对特征重要性的评估,最终使用9个特征构建预测模型。在内部验证数据集中,与梯度提升和AdaBoost分类器相比,随机森林模型的平均精确率为0.93,曲线下面积(AUC)为0.92(95%置信区间:0.88 - 0.94),梯度提升和AdaBoost分类器的平均精确率分别为0.87和0.91,AUC分别为0.87(95%置信区间:0.84 - 0.93)和0.90(95%置信区间:0.86 - 0.92)。此外,随机森林分类器在其他5个诊断指标中表现最佳。此外,我们将该模型嵌入到一个名为MoLPre(https://molpre.cqmu.edu.cn/)的网络应用程序中,这是一个便于用户使用的工具,可协助预测cT1期实性肺癌的转移情况。