Department of Respiratory Medicine, Shaoxing People's Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing China.
Department of Respiratory Medicine, Shaoxing People's Hospital (Shaoxing Hospital, Zhejiang University School of Medicine), Shaoxing China
G3 (Bethesda). 2020 Jul 7;10(7):2423-2434. doi: 10.1534/g3.120.401207.
Lung adenocarcinoma (LUAD) is one of the most common malignant tumors. How to effectively diagnose LUAD at an early stage and make an accurate judgement of the occurrence and progression of LUAD are still the focus of current research. Support vector machine (SVM) is one of the most effective methods for diagnosing LUAD of different stages. The study aimed to explore the dynamic change of differentially expressed genes (DEGs) in different stages of LUAD, and to assess the risk of LUAD through DEGs enriched pathways and establish a diagnostic model based on SVM method. Based on TMN stages and gene expression profiles of 517 samples in TCGA-LUAD database, coefficient of variation () combined with one-way analysis of variance () were used to screen out feature genes in different TMN stages after data standardization. Unsupervised clustering analysis was conducted on samples and feature genes. The feature genes were analyzed by Pearson correlation coefficient to construct a co-expression network. Fisher exact test was conducted to verify the most enriched pathways, and the variation of each pathway in different stages was analyzed. SVM networks were trained and ROC curves were drawn based on the predicted results so as to evaluate the predictive effectiveness of the SVM model. Unsupervised hierarchical clustering analysis results showed that almost all the samples in stage III/IV were clustered together, while samples in stage I/II were clustered together. The correlation of feature genes in different stages was different. In addition, with the increase of malignant degree of lung cancer, the average shortest path of the network gradually increased, while the closeness centrality gradually decreased. Finally, four feature pathways that could distinguish different stages of LUAD were obtained and the ability was tested by the SVM model with an accuracy of 91%. Functional level differences were quantified based on the expression of feature genes in lung cancer patients of different stages, so as to help the diagnosis and prediction of lung cancer. The accuracy of our model in differentiating between stage I/II and stage III/IV could reach 91%.
肺腺癌 (LUAD) 是最常见的恶性肿瘤之一。如何在早期有效诊断 LUAD,并对 LUAD 的发生和进展做出准确判断,仍是当前研究的重点。支持向量机 (SVM) 是诊断不同阶段 LUAD 的最有效方法之一。本研究旨在探讨 LUAD 不同阶段差异表达基因 (DEGs) 的动态变化,并通过 DEGs 富集途径评估 LUAD 的风险,建立基于 SVM 方法的诊断模型。基于 TCGA-LUAD 数据库中 517 个样本的 TMN 分期和基因表达谱,通过数据标准化后,采用变异系数 () 结合单向方差分析 () 筛选出不同 TMN 分期的特征基因。对样本和特征基因进行无监督聚类分析。采用 Pearson 相关系数对特征基因进行分析,构建共表达网络。采用 Fisher 精确检验对最富集的通路进行验证,并分析各通路在不同分期的变化。基于预测结果训练 SVM 网络并绘制 ROC 曲线,以评估 SVM 模型的预测效果。无监督层次聚类分析结果表明,III/IV 期几乎所有样本均聚类在一起,而 I/II 期样本聚类在一起。不同分期特征基因的相关性不同。此外,随着肺癌恶性程度的增加,网络的平均最短路径逐渐增加,而接近中心度逐渐降低。最终获得 4 个能够区分 LUAD 不同分期的特征通路,并通过 SVM 模型进行测试,准确率为 91%。基于不同分期肺癌患者特征基因的表达,对功能水平差异进行量化,有助于肺癌的诊断和预测。我们的模型在区分 I/II 期和 III/IV 期的准确率可达 91%。