Department of Radiotherapy, Chongqing University Cancer Hospital, Chongqing, China.
Department of Equipment, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China.
J Cancer Res Clin Oncol. 2023 Jul;149(7):3915-3924. doi: 10.1007/s00432-022-04312-7. Epub 2022 Aug 26.
To use weighted gene correlation network analysis (WGCNA) and machine learning algorithm to predict classification of early pulmonary nodes with public databases.
The expression data and clinical data of lung cancer patients were firstly extracted from public database (GTEx and TCGA) to study the differentially expressed genes (DEGs) of lung adenocarcinoma (LUAD). The intersection of three R packages (Dseq2, Limma, EdgeR) methods were selected as candidate DEGs for further study. WGCNA was used to obtain relevant modules and key genes of lung cancer classification, GO and KEGG enrichment analysis was performed. The model was built using two machine learning methods, Least Absolute Shrinkage and Selection Operator (LASSO) regression and tumor classification was also predicted with extreme Gradient Boosting (XGBoost) algorithm.
DEGs analysis revealed that there were 1306 LUAD genes. WGCNA module analysis showed that a total of 116 genes were significantly related to classification, and module genes were mainly related to 14 KEGG pathways. The machine learning algorithm identified 10 target genes by LASSO regression analysis of differential genes, and 18 genes were identified by XGBoost model. A total of 6 genes were found from the intersection of the above methods as classification signatures of early pulmonary nodules, including "HMGB3" "ARHGAP6" "TCF21" "FCN3" "COL6A6" "GOLM1".
Using DEGs analysis, WGCNA method and machine learning algorithm, six gene signatures related to early stage of LUAD, which can assist clinicians in disease classification prediction.
利用加权基因共相关网络分析(WGCNA)和机器学习算法,基于公共数据库预测早期肺结节的分类。
首先从公共数据库(GTEx 和 TCGA)中提取肺癌患者的表达数据和临床数据,以研究肺腺癌(LUAD)的差异表达基因(DEGs)。使用三个 R 包(Dseq2、Limma、EdgeR)的交集方法筛选候选 DEGs 进行进一步研究。使用 WGCNA 获得与肺癌分类相关的模块和关键基因,进行 GO 和 KEGG 富集分析。使用两种机器学习方法(最小绝对值收缩和选择算子(LASSO)回归和肿瘤分类)构建模型,并使用极端梯度提升(XGBoost)算法进行肿瘤分类预测。
DEGs 分析显示有 1306 个 LUAD 基因。WGCNA 模块分析表明,共有 116 个基因与分类显著相关,模块基因主要与 14 个 KEGG 通路相关。通过差异基因的 LASSO 回归分析,机器学习算法鉴定了 10 个靶基因,XGBoost 模型鉴定了 18 个基因。通过上述方法的交集,共发现 6 个基因作为早期肺结节分类的特征,包括“HMGB3”“ARHGAP6”“TCF21”“FCN3”“COL6A6”“GOLM1”。
本研究通过 DEGs 分析、WGCNA 方法和机器学习算法,确定了 6 个与 LUAD 早期阶段相关的基因特征,可辅助临床医生进行疾病分类预测。