Meng Xiangfu, Tian Youfa, Zhang Xiaoyan
School of Electronics and Information Engineering, Liaoning Technical University, Huludao, Liaoning 125000, P. R. China.
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):70-79. doi: 10.7507/1001-5515.202305038.
Lung cancer is one of the malignant tumors with the greatest threat to human health, and studies have shown that some genes play an important regulatory role in the occurrence and development of lung cancer. In this paper, a LightGBM ensemble learning method is proposed to construct a prognostic model based on immune relate gene (IRG) profile data and clinical data to predict the prognostic survival rate of lung adenocarcinoma patients. First, this method used the Limma package for differential gene expression, used CoxPH regression analysis to screen the IRG to prognosis, and then used XGBoost algorithm to score the importance of the IRG features. Finally, the LASSO regression analysis was used to select IRG that could be used to construct a prognostic model, and a total of 17 IRG features were obtained that could be used to construct model. LightGBM was trained according to the IRG screened. The K-means algorithm was used to divide the patients into three groups, and the area under curve (AUC) of receiver operating characteristic (ROC) of the model output showed that the accuracy of the model in predicting the survival rates of the three groups of patients was 96%, 98% and 96%, respectively. The experimental results show that the model proposed in this paper can divide patients with lung adenocarcinoma into three groups [5-year survival rate higher than 65% (group 1), lower than 65% but higher than 30% (group 2) and lower than 30% (group 3)] and can accurately predict the 5-year survival rate of lung adenocarcinoma patients.
肺癌是对人类健康威胁最大的恶性肿瘤之一,研究表明一些基因在肺癌的发生发展中起重要调节作用。本文提出一种LightGBM集成学习方法,基于免疫相关基因(IRG)谱数据和临床数据构建预后模型,以预测肺腺癌患者的预后生存率。首先,该方法使用Limma软件包进行差异基因表达分析,采用CoxPH回归分析筛选与预后相关的IRG,然后使用XGBoost算法对IRG特征的重要性进行评分。最后,使用LASSO回归分析选择可用于构建预后模型的IRG,共获得17个可用于构建模型的IRG特征。根据筛选出的IRG对LightGBM进行训练。使用K-means算法将患者分为三组,模型输出的受试者工作特征曲线(ROC)下面积(AUC)表明,该模型预测三组患者生存率的准确率分别为96%、98%和96%。实验结果表明,本文提出的模型可将肺腺癌患者分为三组[5年生存率高于65%(第1组)、低于65%但高于30%(第2组)和低于30%(第3组)],并能准确预测肺腺癌患者的5年生存率。