Suppr超能文献

基于机器学习和生物信息学分析的结肠癌诊断和分期分类。

Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis.

机构信息

College of Software, Xinjiang University, Urumqi, 830046, Xinjiang, China.

College of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China.

出版信息

Comput Biol Med. 2022 Jun;145:105409. doi: 10.1016/j.compbiomed.2022.105409. Epub 2022 Mar 19.

Abstract

Advanced metastasis of colon cancer makes it more difficult to treat colon cancer. Finding the markers of colon cancer (Colon Cancer) can diagnose the stage of cancer in time and improve the prognosis with timely treatment. This paper uses gene expression profiling data from The Cancer Genome Atlas (TCGA) for the diagnosis of colon cancer and its staging. In this study, we first selected the gene modules with the greatest correlation with cancer by Weighted Gene Co-expression Network Analysis (WGCNA), extracted the characteristic genes for differential expression results using the least absolute shrinkage and selection operator algorithm (Lasso) and performed survival analysis, and then combined the genes in the modules with the Lasso-extracted feature genes were combined to diagnose colon cancer versus healthy controls using RF, SVM and decision trees, and colon cancer staging was diagnosed using differentially expressed genes for each stage. Finally, Protein-Protein Interaction Networks (PPI) networks were done for 289 genes to identify clusters of aggregated proteins for survival analysis. Finally, the RF model had the best results in the diagnosis of colon cancer versus control group fold cross-validation with an average accuracy of 99.81%, F1 value reaching 0.9968, accuracy of 99.88%, and recall of 99.5%, and an average accuracy of 91.5%, F1 value reaching 0.7679, accuracy of 86.94%, and recall in the diagnosis of colon cancer stages I, II, III and IV. The recall rate reached 73.04%, and eight genes associated with colon cancer prognosis were identified for GCNT2, GLDN, SULT1B1, UGT2B15, PTGDR2, GPR15, BMP5 and CPT2.

摘要

结直肠癌的晚期转移使其更难治疗。寻找结直肠癌(Colon Cancer)的标志物可以及时诊断癌症的分期,并通过及时治疗改善预后。本文使用来自癌症基因组图谱(TCGA)的基因表达谱数据来诊断结直肠癌及其分期。在这项研究中,我们首先通过加权基因共表达网络分析(WGCNA)选择与癌症相关性最大的基因模块,使用最小绝对收缩和选择算子算法(Lasso)提取差异表达结果的特征基因,并进行生存分析,然后将模块中的基因与 Lasso 提取的特征基因相结合,使用 RF、SVM 和决策树对结直肠癌与健康对照组进行诊断,使用每个阶段的差异表达基因对结直肠癌进行分期。最后,对 289 个基因进行蛋白质-蛋白质相互作用网络(PPI)分析,以确定用于生存分析的聚集蛋白簇。最终,RF 模型在结直肠癌与对照组的折叠交叉验证中表现最佳,平均准确率为 99.81%,F1 值达到 0.9968,准确率为 99.88%,召回率为 99.5%,平均准确率为 91.5%,F1 值达到 0.7679,准确率为 86.94%,I、II、III 和 IV 期结直肠癌的诊断召回率达到 73.04%。确定了与结直肠癌预后相关的八个基因,包括 GCNT2、GLDN、SULT1B1、UGT2B15、PTGDR2、GPR15、BMP5 和 CPT2。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验