Department of Biochemistry and Department of Thoracic Surgery of the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310003, China.
Radiation Biology Department, National Center for Radiation Research and Technology, Egyptian Atomic Energy Authority, Cairo 13759, Egypt.
Math Biosci Eng. 2021 Oct 19;18(6):8997-9015. doi: 10.3934/mbe.2021443.
Colorectal cancer (CRC) is one of the most common malignancies worldwide. Biomarker discovery is critical to improve CRC diagnosis, however, machine learning offers a new platform to study the etiology of CRC for this purpose. Therefore, the current study aimed to perform an integrated bioinformatics and machine learning analyses to explore novel biomarkers for CRC prognosis. In this study, we acquired gene expression microarray data from Gene Expression Omnibus (GEO) database. The microarray expressions GSE103512 dataset was downloaded and integrated. Subsequently, differentially expressed genes (DEGs) were identified and functionally analyzed via Gene Ontology (GO) and Kyoto Enrichment of Genes and Genomes (KEGG). Furthermore, protein protein interaction (PPI) network analysis was conducted using the STRING database and Cytoscape software to identify hub genes; however, the hub genes were subjected to Support Vector Machine (SVM), Receiver operating characteristic curve (ROC) and survival analyses to explore their diagnostic values. Meanwhile, TCGA transcriptomics data in Gene Expression Profiling Interactive Analysis (GEPIA) database and the pathology data presented by in the human protein atlas (HPA) database were used to verify our transcriptomic analyses. A total of 105 DEGs were identified in this study. Functional enrichment analysis showed that these genes were significantly enriched in biological processes related to cancer progression. Thereafter, PPI network explored a total of 10 significant hub genes. The ROC curve was used to predict the potential application of biomarkers in CRC diagnosis, with an area under ROC curve (AUC) of these genes exceeding 0.92 suggesting that this risk classifier can discriminate between CRC patients and normal controls. Moreover, the prognostic values of these hub genes were confirmed by survival analyses using different CRC patient cohorts. Our results demonstrated that these 10 differentially expressed hub genes could be used as potential biomarkers for CRC diagnosis.
结直肠癌(CRC)是全球最常见的恶性肿瘤之一。生物标志物的发现对于改善 CRC 的诊断至关重要,然而,机器学习为此提供了一个研究 CRC 病因的新平台。因此,本研究旨在进行综合的生物信息学和机器学习分析,以探索 CRC 预后的新生物标志物。在这项研究中,我们从基因表达综合数据库(GEO)数据库中获取了基因表达微阵列数据。下载并整合了 GSE103512 数据集的微阵列表达数据。随后,通过基因本体论(GO)和京都基因与基因组百科全书(KEGG)进行了差异表达基因(DEGs)的鉴定和功能分析。此外,使用 STRING 数据库和 Cytoscape 软件进行蛋白质-蛋白质相互作用(PPI)网络分析,以识别枢纽基因;然而,这些枢纽基因还经过支持向量机(SVM)、接收器工作特征曲线(ROC)和生存分析,以探讨其诊断价值。同时,我们使用基因表达谱交互式分析(GEPIA)数据库中的 TCGA 转录组学数据和人类蛋白质图谱(HPA)数据库中的病理学数据来验证我们的转录组学分析。本研究共鉴定出 105 个 DEGs。功能富集分析表明,这些基因在与癌症进展相关的生物学过程中显著富集。此后,PPI 网络共探索了 10 个重要的枢纽基因。ROC 曲线用于预测生物标志物在 CRC 诊断中的潜在应用,这些基因的 ROC 曲线下面积(AUC)超过 0.92,表明该风险分类器可以区分 CRC 患者和正常对照。此外,通过使用不同的 CRC 患者队列进行生存分析,验证了这些枢纽基因的预后价值。我们的结果表明,这 10 个差异表达的枢纽基因可作为 CRC 诊断的潜在生物标志物。