Suppr超能文献

基于机器学习算法构建肺癌患者化疗后肺部感染风险预测模型。

Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm.

作者信息

Sun Tao, Liu Jun, Yuan Houqin, Li Xin, Yan Hui

机构信息

Department of Hematology and Oncology Laboratory, The Central Hospital of Shaoyang, Shaoyang, Hunan, China.

Department of Scientific Research, The First Affiliated Hospital of Shaoyang University, Shaoyang, Hunan, China.

出版信息

Front Oncol. 2024 Aug 9;14:1403392. doi: 10.3389/fonc.2024.1403392. eCollection 2024.

Abstract

PURPOSE

The objective of this study was to create and validate a machine learning (ML)-based model for predicting the likelihood of lung infections following chemotherapy in patients with lung cancer.

METHODS

A retrospective study was conducted on a cohort of 502 lung cancer patients undergoing chemotherapy. Data on age, Body Mass Index (BMI), underlying disease, chemotherapy cycle, number of hospitalizations, and various blood test results were collected from medical records. We used the Synthetic Minority Oversampling Technique (SMOTE) to handle unbalanced data. Feature screening was performed using the Boruta algorithm and The Least Absolute Shrinkage and Selection Operator (LASSO). Subsequently, six ML algorithms, namely Logistic Regression (LR), Random Forest (RF), Gaussian Naive Bayes (GNB), Multi-layer Perceptron (MLP), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) were employed to train and develop an ML model using a 10-fold cross-validation methodology. The model's performance was evaluated through various metrics, including the area under the receiver operating characteristic curve (ROC), accuracy, sensitivity, specificity, F1 score, calibration curve, decision curves, clinical impact curve, and confusion matrix. In addition, model interpretation was performed by the Shapley Additive Explanations (SHAP) analysis to clarify the importance of each feature of the model and its decision basis. Finally, we constructed nomograms to make the predictive model results more readable.

RESULTS

The integration of Boruta and LASSO methodologies identified Gender, Smoke, Drink, Chemotherapy cycles, pleural effusion (PE), Neutrophil-lymphocyte count ratio (NLR), Neutrophil-monocyte count ratio (NMR), Lymphocytes (LYM) and Neutrophil (NEUT) as significant predictors. The LR model demonstrated superior performance compared to alternative ML algorithms, achieving an accuracy of 81.80%, a sensitivity of 81.1%, a specificity of 82.5%, an F1 score of 81.6%, and an AUC of 0.888(95%CI(0.863-0.911)). Furthermore, the SHAP method identified Chemotherapy cycles and Smoke as the primary decision factors influencing the ML model's predictions. Finally, this study successfully constructed interactive nomograms and dynamic nomograms.

CONCLUSION

The ML algorithm, combining demographic and clinical factors, accurately predicted post-chemotherapy lung infections in cancer patients. The LR model performed well, potentially improving early detection and treatment in clinical practice.

摘要

目的

本研究的目的是创建并验证一个基于机器学习(ML)的模型,用于预测肺癌患者化疗后发生肺部感染的可能性。

方法

对502例接受化疗的肺癌患者进行回顾性研究。从病历中收集年龄、体重指数(BMI)、基础疾病、化疗周期、住院次数以及各种血液检测结果等数据。我们使用合成少数过采样技术(SMOTE)来处理不平衡数据。使用Boruta算法和最小绝对收缩和选择算子(LASSO)进行特征筛选。随后,采用六种ML算法,即逻辑回归(LR)、随机森林(RF)、高斯朴素贝叶斯(GNB)、多层感知器(MLP)、支持向量机(SVM)和K近邻(KNN),使用10折交叉验证方法训练和开发一个ML模型。通过各种指标评估模型的性能,包括受试者工作特征曲线(ROC)下面积、准确性、敏感性、特异性、F1分数、校准曲线、决策曲线、临床影响曲线和混淆矩阵。此外,通过Shapley加性解释(SHAP)分析进行模型解释,以阐明模型各特征的重要性及其决策依据。最后,我们构建了列线图,以使预测模型结果更具可读性。

结果

Boruta和LASSO方法的整合确定性别、吸烟、饮酒、化疗周期、胸腔积液(PE)、中性粒细胞与淋巴细胞计数比值(NLR)、中性粒细胞与单核细胞计数比值(NMR)、淋巴细胞(LYM)和中性粒细胞(NEUT)为显著预测因子。与其他ML算法相比,LR模型表现出卓越的性能,准确率达到81.80%,敏感性为81.1%,特异性为82.5%,F1分数为81.6%,AUC为0.888(95%CI(0.863 - 0.911))。此外,SHAP方法确定化疗周期和吸烟是影响ML模型预测的主要决策因素。最后,本研究成功构建了交互式列线图和动态列线图。

结论

结合人口统计学和临床因素的ML算法能够准确预测癌症患者化疗后的肺部感染。LR模型表现良好,可能会改善临床实践中的早期检测和治疗。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b881/11341396/470280b6b617/fonc-14-1403392-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验