利用机器学习算法对 RNA-Seq 数据进行综合分析，揭示 CA2、CA7 和 ITM2C 基因标志物在结直肠癌早期检测中的潜力。

Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms.

机构信息

Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211004, India.

National Institute of Animal Biotechnology, Hyderabad 500032, India.

出版信息

Genes (Basel). 2023 Sep 22;14(10):1836. doi: 10.3390/genes14101836.

DOI:10.3390/genes14101836

PMID:37895185

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10606805/

Abstract

Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank -value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC.

摘要

结直肠癌影响结肠或直肠，是一个常见的全球健康问题，每年有 110 万新发病例。本研究旨在利用机器学习（ML）算法，通过基因表达数据，识别用于结直肠癌早期检测的基因特征。TCGA-CRC 和 GSE50760 数据集经过预处理，采用 LASSO 方法结合 5 种 ML 算法（Adaboost、随机森林（RF）、逻辑回归（LR）、高斯朴素贝叶斯（GNB）和支持向量机（SVM））进行特征选择。进一步对重要特征进行基因表达、相关性和生存分析。还对外部数据集 GSE142279 进行了验证。RF 模型对两个数据集的分类准确率最高。通过特征选择过程，确定了 12 个候选基因，通过基因表达和相关性分析进一步减少到 3 个（CA2、CA7 和 ITM2C）。这三个基因在外部数据集达到了 100%的准确率。这些基因的 AUC 值分别为 99.24%、100%和 99.5%。生存分析显示最终基因特征的对数秩检验值为 0.044。肿瘤免疫细胞浸润分析显示与基因特征的表达有微弱相关性。CA2、CA7 和 ITM2C 可作为结直肠癌早期检测的基因特征，可为预后和治疗决策提供有价值的信息。需要进一步研究以充分了解这些基因在结直肠癌中的潜在作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0794/10606805/6744d0202950/genes-14-01836-g001.jpg

相似文献

Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms.利用机器学习算法对 RNA-Seq 数据进行综合分析，揭示 CA2、CA7 和 ITM2C 基因标志物在结直肠癌早期检测中的潜力。

Genes (Basel). 2023 Sep 22;14(10):1836. doi: 10.3390/genes14101836.

Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning.利用 RNA-Seq 数据集和机器学习技术鉴定肝细胞癌（HCC）的新型转录生物标志物。

BMC Cancer. 2021 Aug 27;21(1):962. doi: 10.1186/s12885-021-08704-9.

Classification and Diagnostic Prediction of Colorectal Cancer Mortality Based on Machine Learning Algorithms: A Multicenter National Study.基于机器学习算法的结直肠癌死亡率的分类和诊断预测：一项多中心全国性研究。

Asian Pac J Cancer Prev. 2024 Jan 1;25(1):333-342. doi: 10.31557/APJCP.2024.25.1.333.

Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods.基于转录组谱特征选择和机器学习方法的乳腺癌预测。

BMC Bioinformatics. 2022 Oct 1;23(1):410. doi: 10.1186/s12859-022-04965-8.

High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer.高通量组学与统计学习在结直肠癌新型诊断标志物发现与验证中的整合。

Int J Mol Sci. 2019 Jan 12;20(2):296. doi: 10.3390/ijms20020296.

Predicting Colorectal Cancer Recurrence and Patient Survival Using Supervised Machine Learning Approach: A South African Population-Based Study.使用监督机器学习方法预测结直肠癌复发和患者生存：一项南非基于人群的研究。

Front Public Health. 2021 Jul 7;9:694306. doi: 10.3389/fpubh.2021.694306. eCollection 2021.

Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer.基于递归特征消除的特征选择框架，提高结直肠癌多死因分类性能

Lab Invest. 2024 Mar;104(3):100320. doi: 10.1016/j.labinv.2023.100320. Epub 2023 Dec 28.

Blood Biomarkers Panels for Screening of Colorectal Cancer and Adenoma on a Machine Learning-Assisted Detection Platform.基于机器学习辅助检测平台的用于结直肠癌和腺瘤筛查的血液生物标志物检测面板。

Cancer Control. 2023 Jan-Dec;30:10732748231222109. doi: 10.1177/10732748231222109.

Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods.基于稳健机器学习-递归特征消除方法的基因表达数据的稳健生物标志物筛选。

Comput Biol Chem. 2022 Oct;100:107747. doi: 10.1016/j.compbiolchem.2022.107747. Epub 2022 Jul 29.

Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers.利用机器学习方法研究结直肠癌肿瘤微环境及其生物标志物。

Int J Mol Sci. 2023 Jul 6;24(13):11133. doi: 10.3390/ijms241311133.

引用本文的文献

An Integrative Genomics Approach for the Discovery of Potential Clinically Actionable Diagnostic and Prognostic Biomarkers in Colorectal Cancer.一种用于发现结直肠癌潜在临床可行诊断和预后生物标志物的综合基因组学方法。

Biomedicines. 2025 Jul 7;13(7):1651. doi: 10.3390/biomedicines13071651.

Migration of Kupffer's vesicle-derived cells is essential for tail morphogenesis in zebrafish embryos.库普弗小泡来源细胞的迁移对斑马鱼胚胎的尾部形态发生至关重要。

Development. 2025 Jun 15;152(12). doi: 10.1242/dev.204791. Epub 2025 Jun 19.

Integrating machine learning, bioinformatics and experimental verification to identify a novel prognostic marker associated with tumor immune microenvironment in head and neck squamous carcinoma.整合机器学习、生物信息学和实验验证，以鉴定与头颈部鳞状细胞癌肿瘤免疫微环境相关的新型预后标志物。

Front Immunol. 2024 Dec 10;15:1501486. doi: 10.3389/fimmu.2024.1501486. eCollection 2024.

Identification of a Prognostic Model Based on NK Cell-Related Genes in Multiple Myeloma Using Single-Cell and Transcriptomic Data Analysis.利用单细胞和转录组数据分析鉴定基于NK细胞相关基因的多发性骨髓瘤预后模型

Blood Lymphat Cancer. 2024 Jun 4;14:31-48. doi: 10.2147/BLCTT.S461529. eCollection 2024.

USP3 promotes osteosarcoma progression via deubiquitinating EPHA2 and activating the PI3K/AKT signaling pathway.USP3 通过去泛素化 EPHA2 并激活 PI3K/AKT 信号通路促进骨肉瘤进展。

Cell Death Dis. 2024 Mar 26;15(3):235. doi: 10.1038/s41419-024-06624-7.

Using machine learning approach for screening metastatic biomarkers in colorectal cancer and predictive modeling with experimental validation.采用机器学习方法筛选结直肠癌转移标志物并进行实验验证的预测建模。

Sci Rep. 2023 Nov 8;13(1):19426. doi: 10.1038/s41598-023-46633-8.

本文引用的文献

Identification and clinical validation of key genes as the potential biomarkers in colorectal adenoma.鉴定和临床验证结直肠腺瘤潜在生物标志物的关键基因。

BMC Cancer. 2023 Jan 11;23(1):39. doi: 10.1186/s12885-022-10422-9.

Global burden of colorectal cancer in 2020 and 2040: incidence and mortality estimates from GLOBOCAN.2020年和2040年全球结直肠癌负担：来自全球癌症负担（GLOBOCAN）的发病率和死亡率估计

Gut. 2023 Feb;72(2):338-344. doi: 10.1136/gutjnl-2022-327736. Epub 2022 Sep 8.

Metastatic colorectal cancer: ESMO Clinical Practice Guideline for diagnosis, treatment and follow-up.转移性结直肠癌：ESMO 诊断、治疗及随访临床实践指南

Ann Oncol. 2023 Jan;34(1):10-32. doi: 10.1016/j.annonc.2022.10.003. Epub 2022 Oct 25.

Colorectal Cancer Is Associated with the Presence of Cancer Driver Mutations in Normal Colon.结直肠癌与正常结肠中癌症驱动基因突变的存在有关。

Cancer Res. 2022 Apr 15;82(8):1492-1502. doi: 10.1158/0008-5472.CAN-21-3607.

Development of Tumor Mutation Burden-Related Prognostic Model and Novel Biomarker Identification in Stomach Adenocarcinoma.胃腺癌中肿瘤突变负荷相关预后模型的建立及新型生物标志物的鉴定

Front Cell Dev Biol. 2022 Mar 23;10:790920. doi: 10.3389/fcell.2022.790920. eCollection 2022.

Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis.基于机器学习和生物信息学分析的结肠癌诊断和分期分类。

Comput Biol Med. 2022 Jun;145:105409. doi: 10.1016/j.compbiomed.2022.105409. Epub 2022 Mar 19.

ITM2A as a Tumor Suppressor and Its Correlation With PD-L1 in Breast Cancer.ITM2A作为一种肿瘤抑制因子及其在乳腺癌中与PD-L1的相关性

Front Oncol. 2021 Feb 12;10:581733. doi: 10.3389/fonc.2020.581733. eCollection 2020.

Rising incidence of early-onset colorectal cancer - a call to action.结直肠癌发病年轻化——行动的召唤。

Nat Rev Clin Oncol. 2021 Apr;18(4):230-243. doi: 10.1038/s41571-020-00445-1. Epub 2020 Nov 20.

The Effect of Nanoparticles on the Structure and Enzymatic Activity of Human Carbonic Anhydrase I and II.纳米粒子对人碳酸酐酶 I 和 II 的结构和酶活性的影响。

Molecules. 2020 Sep 25;25(19):4405. doi: 10.3390/molecules25194405.

Driver mutations of the adenoma-carcinoma sequence govern the intestinal epithelial global translational capacity.腺瘤-癌序列的驱动突变控制着肠道上皮的整体翻译能力。

Proc Natl Acad Sci U S A. 2020 Oct 13;117(41):25560-25570. doi: 10.1073/pnas.1912772117. Epub 2020 Sep 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用机器学习算法对 RNA-Seq 数据进行综合分析，揭示 CA2、CA7 和 ITM2C 基因标志物在结直肠癌早期检测中的潜力。

Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献