Hu Xin, Zhao Shiqiao, Li Yanlun, Heibi Yiluo, Wu Hang, Jiang Yongjie
Department of Respiratory and Critical Care Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
Department of Emergency Medicine, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China.
Sci Rep. 2025 Mar 21;15(1):9714. doi: 10.1038/s41598-025-93842-4.
Malignant pleural effusion (MPE) is a common complication in patients with advanced lung cancer, significantly impacting their survival rates and quality of life. Effective tools for assessing the prognosis of these patients are urgently needed to enable early intervention. This study retrospectively analyzed patient data from the Affiliated Hospital of North Sichuan Medical College from 2013 to 2021, which served as the training cohort and internal testing cohort. Additionally, three external testing cohorts were introduced: data from Guang'an People's Hospital as cohort 1, data from Dazhou Central Hospital as cohort 2, and data from the Affiliated Hospital of North Sichuan Medical College from January 1, 2023, to December 31, 2023, constituting the temporal external testing cohort. Univariate logistic regression (LR) analysis of clinical variables (P < 0.05) was performed, followed by multivariate LR to identify independent predictors for inclusion in nine machine learning models: Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Elastic Net (Enet), Radial Support Vector Machine (rSVM), Multilayer Perceptron (MLP), LR, Light Gradient Boosting Machine (LightGBM), and K-Nearest Neighbors (KNN). The best-performing model was used to develop a nomogram for patient risk stratification. Three variables-treatment regimen, presence of pericardial effusion, and total pleural effusion volume-were identified as significant predictors in the study. The LR model demonstrated the best performance, achieving area under the curve (AUC) values of 0.885 in the training cohort, 0.954 in the internal testing cohort, and 0.920 in external testing cohort 1. To further validate the model's robustness, the nomogram developed from the LR model was evaluated in two additional validation cohorts: external testing cohort 2 and a temporal external testing cohort. The nomogram achieved AUCs of 0.962 in external testing cohort 2 and 0.949 in the temporal external testing cohort, demonstrating strong predictive accuracy. Calibration curves confirmed excellent model-reality concordance across all cohorts, and decision curve analysis (DCA) revealed superior clinical utility. The nomogram enabled individualized risk quantification and showed significant survival differences between high-risk/very high-risk groups and low-risk/medium-risk groups. This study evaluated nine machine learning models for prognostic prediction in lung cancer patients with MPE, finding that the LR-based model offered the best performance. A nomogram based on this model can effectively stratify patients for prognostic assessment and early intervention.
恶性胸腔积液(MPE)是晚期肺癌患者常见的并发症,严重影响其生存率和生活质量。迫切需要有效的工具来评估这些患者的预后,以便进行早期干预。本研究回顾性分析了2013年至2021年川北医学院附属医院的患者数据,该数据用作训练队列和内部测试队列。此外,引入了三个外部测试队列:广安人民医院的数据作为队列1,达州中心医院的数据作为队列2,以及川北医学院附属医院2023年1月1日至2023年12月31日的数据,构成时间外部测试队列。对临床变量进行单因素逻辑回归(LR)分析(P<0.05),然后进行多因素LR分析,以确定纳入九个机器学习模型的独立预测因素:决策树(DT)、随机森林(RF)、极端梯度提升(XGBoost)、弹性网络(Enet)、径向支持向量机(rSVM)、多层感知器(MLP)、LR、轻梯度提升机(LightGBM)和K近邻(KNN)。使用表现最佳的模型开发患者风险分层列线图。研究确定了三个变量——治疗方案、心包积液的存在和胸腔积液总量——为显著预测因素。LR模型表现最佳,在训练队列中的曲线下面积(AUC)值为0.885,在内部测试队列中为0.954,在外部测试队列1中为0.920。为了进一步验证模型的稳健性,在另外两个验证队列中评估了从LR模型开发的列线图:外部测试队列2和时间外部测试队列。该列线图在外部测试队列2中的AUC为0.962,在时间外部测试队列中的AUC为0.949,显示出很强的预测准确性。校准曲线证实了所有队列中模型与实际情况的良好一致性,决策曲线分析(DCA)显示出卓越的临床实用性。该列线图能够进行个体化风险量化,并显示出高风险/极高风险组与低风险/中等风险组之间存在显著的生存差异。本研究评估了九个机器学习模型对MPE肺癌患者的预后预测,发现基于LR的模型表现最佳。基于该模型的列线图可以有效地对患者进行分层,用于预后评估和早期干预。