Sahoo Karishma, Sundararajan Vino
Integrative Multiomics Lab, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
Discov Oncol. 2025 Feb 28;16(1):252. doi: 10.1007/s12672-025-01989-3.
Colorectal cancer (CRC) is the third most common cancer globally, necessitating novel biomarkers for early diagnosis and treatment. This study proposes an efficient pipeline leveraging an integrated bioinformatics and machine learning framework to enhance the identification of diagnostic and prognostic biomarkers for CRC.
A selection of methylated differentially expressed genes (MeDEGs) and features (genes) was made using both statistical and Machine learning (ML) approaches from publically available datasets. These genes were subjected to STRING network construction and hub genes estimation, separately. Also, essential miRNAs (micro-RNAs) and TFs (Transcription factors) as regulatory elements were revealed and findings were validated through scRNA-seq analysis, promoter methylation, gene expression levels correlated with pathological stage, and interaction with tumor-infiltrating immune cells.
Through an integrated analysis pipeline, we identified 27 hub genes, among which CTNNB1, GSK3B, IL-1β, MYC, PXDN, TP53, EGFR, SRC, COL1A1, and TGBF1 showed better diagnostic behaviour. Machine learning approach includes the development of K-Nearest Neighbors (KNN), Artificial Neural Networks (ANN), and Random Forest (RF) models using TCGA datasets, achieving an accuracy range between 99 and 100%. The Area Under the Curve (AUC) value for each model is 1.00, signifying good classification performance. The high expression of some diagnostic genes was associated with poor prognosis, concluding IL-1β as both a prognostic and diagnostic biomarker. Additionally, the NF-κB and microRNAs (miR-548d-3p, miR-548-ac) and TFs (NFκB and STAT5A) play a major role in the comprehensive regulatory network for CRC. Furthermore, hub genes such as IL-1β, TGFB1, and COL1A1 were significantly correlated with immune infiltrates, suggesting their potential role in CRC progression.
Overall, the elevated expression of IL-1β coupled with abnormal DNA methylation, and its consequent effect on the PI3K/Akt signaling pathway are relevant prognostic and therapeutic marker in CRC. Additional molecular candidates reveal insights into the epigenetic regulatory targets of CRC and their association with immune cell infiltration.
结直肠癌(CRC)是全球第三大常见癌症,因此需要新的生物标志物用于早期诊断和治疗。本研究提出了一种高效的流程,利用综合生物信息学和机器学习框架来加强对CRC诊断和预后生物标志物的识别。
使用统计和机器学习(ML)方法从公开可用的数据集中筛选甲基化差异表达基因(MeDEGs)和特征(基因)。这些基因分别进行STRING网络构建和枢纽基因估计。此外,还揭示了作为调控元件的重要微小RNA(miRNA)和转录因子(TF),并通过单细胞RNA测序分析、启动子甲基化、与病理分期相关的基因表达水平以及与肿瘤浸润免疫细胞的相互作用对结果进行了验证。
通过综合分析流程,我们鉴定出27个枢纽基因,其中CTNNB1、GSK3B、IL-1β、MYC、PXDN、TP53、EGFR、SRC、COL1A1和TGBF1表现出更好的诊断性能。机器学习方法包括使用TCGA数据集开发K近邻(KNN)、人工神经网络(ANN)和随机森林(RF)模型,准确率范围在99%至100%之间。每个模型的曲线下面积(AUC)值为1.00,表明分类性能良好。一些诊断基因的高表达与不良预后相关,得出IL-1β既是预后生物标志物也是诊断生物标志物的结论。此外,NF-κB和微小RNA(miR-548d-3p、miR-548-ac)以及转录因子(NFκB和STAT5A)在CRC的综合调控网络中起主要作用。此外,IL-1β、TGFB1和COL1A1等枢纽基因与免疫浸润显著相关,表明它们在CRC进展中的潜在作用。
总体而言,IL-1β表达升高与DNA甲基化异常及其对PI3K/Akt信号通路的后续影响是CRC相关的预后和治疗标志物。其他分子候选物揭示了CRC的表观遗传调控靶点及其与免疫细胞浸润的关联。