Suppr超能文献

采用集成学习方法识别肺癌患者的恶病质

Identification of cachexia in lung cancer patients with an ensemble learning approach.

作者信息

Jia Pingping, Zhao Qianqian, Wu Xiaoxiao, Shen Fangqi, Sun Kai, Wang Xiaolin

机构信息

Department of Clinical Nutrition, Beijing Shijitan Hospital, Capital Medical University, Beijing, China.

出版信息

Front Nutr. 2024 May 30;11:1380949. doi: 10.3389/fnut.2024.1380949. eCollection 2024.

Abstract

OBJECTIVE

Nutritional intervention prior to the occurrence of cachexia will significantly improve the survival rate of lung cancer patients. This study aimed to establish an ensemble learning model based on anthropometry and blood indicators without information on body weight loss to identify the risk factors of cachexia for early administration of nutritional support and for preventing the occurrence of cachexia in lung cancer patients.

METHODS

This multicenter study included 4,712 lung cancer patients. The least absolute shrinkage and selection operator (LASSO) method was used to obtain the key indexes. The characteristics excluded weight loss information, and the study data were randomly divided into a training set (70%) and a test set (30%). The training set was used to select the optimal model among 18 models and verify the model performance. A total of 18 machine learning models were evaluated to predict the occurrence of cachexia, and their performance was determined using area under the curve (AUC), accuracy, precision, recall, F1 score, and Matthews correlation coefficient (MCC).

RESULTS

Among 4,712 patients, 1,392 (29.5%) patients were diagnosed with cachexia based on the framework of Fearon et al. A 17-variable gradient boosting classifier (GBC) model including body mass index (BMI), feeding situation, tumor stage, neutrophil-to-lymphocyte ratio (NLR), and some gastrointestinal symptoms was selected among the 18 machine learning models. The GBC model showed good performance in predicting cachexia in the training set (AUC = 0.854, accuracy = 0.819, precision = 0.771, recall = 0.574, F1 score = 0.658, MCC = 0.549, and kappa = 0.538). The abovementioned indicator values were also confirmed in the test set (AUC = 0.859, accuracy = 0.818, precision = 0.801, recall = 0.550, F1 score = 0.652, and MCC = 0.552, and kappa = 0.535). The learning curve, decision boundary, precision recall (PR) curve, the receiver operating curve (ROC), the classification report, and the confusion matrix in the test sets demonstrated good performance. The feature importance diagram showed the contribution of each feature to the model.

CONCLUSIONS

The GBC model established in this study could facilitate the identification of cancer cachexia in lung cancer patients without weight loss information, which would guide early implementation of nutritional interventions to decrease the occurrence of cachexia and improve the overall survival (OS).

摘要

目的

在恶病质发生之前进行营养干预将显著提高肺癌患者的生存率。本研究旨在建立一种基于人体测量学和血液指标的集成学习模型,无需体重减轻信息,以识别恶病质的危险因素,以便早期给予营养支持并预防肺癌患者发生恶病质。

方法

这项多中心研究纳入了4712例肺癌患者。采用最小绝对收缩和选择算子(LASSO)方法获得关键指标。这些特征排除了体重减轻信息,研究数据被随机分为训练集(70%)和测试集(30%)。训练集用于在18个模型中选择最优模型并验证模型性能。共评估了18个机器学习模型来预测恶病质的发生,并使用曲线下面积(AUC)、准确性、精确率、召回率、F1分数和马修斯相关系数(MCC)来确定它们的性能。

结果

在4712例患者中,根据Fearon等人的框架,有1392例(29.5%)患者被诊断为恶病质。在18个机器学习模型中,选择了一个包含体重指数(BMI)、进食情况、肿瘤分期、中性粒细胞与淋巴细胞比值(NLR)和一些胃肠道症状的17变量梯度提升分类器(GBC)模型。GBC模型在训练集中预测恶病质表现出良好性能(AUC = 0.854,准确性 = 0.819,精确率 = 0.771,召回率 = 0.574,F1分数 = 0.658,MCC = 0.549,kappa = 0.538)。上述指标值在测试集中也得到了证实(AUC = 0.859,准确性 = 0.818,精确率 = 0.801,召回率 = 0.550,F1分数 = 0.652,MCC = 0.552,kappa = 0.535)。测试集中的学习曲线、决策边界、精确召回(PR)曲线、接收器操作曲线(ROC)、分类报告和混淆矩阵均显示出良好性能。特征重要性图显示了每个特征对模型的贡献。

结论

本研究建立的GBC模型有助于在无体重减轻信息的肺癌患者中识别癌症恶病质,这将指导早期实施营养干预以减少恶病质的发生并提高总生存期(OS)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b6db/11169803/3c84e62c5d6d/fnut-11-1380949-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验