Zhao Yujia, Li Qian, Cui Xintong, Zhang Zhiyu, You Yong, Hou Xiaowen, Wang Yan, Feng Xu
Department of Health Statistics, School of Public Health, Shenyang Medical College, 146 Huanghe North Street, Shenyang, 110034, China.
Department of Childhood and Maternal and Child Health Care, School of Public Health, Jinzhou Medical University, Jinzhou, 121001, China.
Discov Oncol. 2025 Jul 1;16(1):1217. doi: 10.1007/s12672-025-03048-3.
Colorectal cancer (CRC) ranks as the third most prevalent contributor to global disease burden and represents the second highest mortality rate among all malignancies worldwide. Long non-coding RNAs (lncRNAs) are a new class of regulatory RNAs, which play a crucial role in the occurrence and development of colorectal cancer. Therefore, it is potentially important to use bioinformatics and machine learning methods to study novel biomarkers for CRC.
The RNA-seq data of colorectal cancer and normal colorectal tissue were downloaded from the GEO database. Random forest (RF) and LASSO (Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithms were constructed to screen lncRNAs closely related to CRC, and their screening efficiency was verified. Predict the regulatory genes of lncRNA and construct the ceRNA regulatory network of lncRNA-miRNA-mRNA. Quantitative real-time PCR (qRT-PCR) was used to verify its expression in colorectal cancer tissues and adjacent tissues, as well as its relationship with clinical features of CRC patients.
A total of 3028 CRC-related lncRNAs were initially screened from the GEO database, and 55 differentially expressed lncRNAs (DE lncRNAs) were finally selected through difference analysis. The key lncRNAs were further screened using RF and LASSO. The same gene in the screening results of the above two methods was selected as the key lncRNA of CRC. Finally, five key lncRNAs (NCAL1, CRNDE, HMGA1P4, EPIST and MT1JP) were selected, among them, the expressions of NCAL1, CRNDE and HMGA1P4 were upregulated compared with normal CRC tissues, while the expressions of EPIST and MT1JP were downregulated compared with normal colorectal tissues. The expression of 5 key CRC lncRNAs was verified, and each AUC is greater than 0.7, indicating a good screening effect. Since CRNDE has been studied by members of this research group before, it will not be further studied. It was predicted that 4 lncRNAs would interact with 16 miRNAs and 57 mRNAs. Four key lncRNAs, namely NCAL1, HMGA1P4, EPIST and MT1JP, were experimentally verified. qRT-PCR results showed that the expression of four key lncRNAs in CRC tissues and adjacent tissues had statistical significance (p < 0.001).
In summary, we obtained 5 lncRNAs that may be closely related to colorectal cancer, including NCAL1, CRNDE, HMGA1P4, EPIST and MT1JP. This study found that NCAL1, HMGA1P4, EPIST and MT1JP may be candidate biomarkers for colorectal cancer.
结直肠癌(CRC)是导致全球疾病负担的第三大常见病因,在全球所有恶性肿瘤中死亡率排名第二。长链非编码RNA(lncRNAs)是一类新型调控RNA,在结直肠癌的发生发展中起关键作用。因此,利用生物信息学和机器学习方法研究结直肠癌的新型生物标志物具有潜在重要意义。
从GEO数据库下载结直肠癌和正常结直肠组织的RNA测序数据。构建随机森林(RF)和LASSO(最小绝对收缩和选择算子回归算法)来筛选与结直肠癌密切相关的lncRNAs,并验证其筛选效率。预测lncRNA的调控基因,构建lncRNA- miRNA- mRNA的ceRNA调控网络。采用定量实时PCR(qRT-PCR)验证其在结直肠癌组织和癌旁组织中的表达,以及与CRC患者临床特征的关系。
最初从GEO数据库中筛选出3028个与CRC相关的lncRNAs,通过差异分析最终筛选出55个差异表达的lncRNAs(DE lncRNAs)。使用RF和LASSO进一步筛选关键lncRNAs。将上述两种方法筛选结果中的相同基因选为CRC的关键lncRNA。最终筛选出5个关键lncRNAs(NCAL1、CRNDE、HMGA1P4、EPIST和MT1JP),其中,与正常CRC组织相比,NCAL1、CRNDE和HMGA1P4的表达上调,而与正常结直肠组织相比,EPIST和MT1JP的表达下调。验证了5个关键CRC lncRNAs的表达,每个AUC均大于0.7,表明筛选效果良好。由于本研究团队成员之前已对CRNDE进行过研究,故不再进一步研究。预测4个lncRNAs将与16个miRNA和57个mRNA相互作用。对4个关键lncRNAs,即NCAL1、HMGA1P4、EPIST和MT1JP进行了实验验证。qRT-PCR结果显示,4个关键lncRNAs在CRC组织和癌旁组织中的表达具有统计学意义(p < 0.001)。
综上所述,我们获得了5个可能与结直肠癌密切相关的lncRNAs,包括NCAL1、CRNDE、HMGA1P4、EPIST和MT1JP。本研究发现NCAL1、HMGA1P4、EPIST和MT1JP可能是结直肠癌的候选生物标志物。