Suppr超能文献

利用机器学习算法对 RNA-Seq 数据进行综合分析,揭示 CA2、CA7 和 ITM2C 基因标志物在结直肠癌早期检测中的潜力。

Unlocking the Potential of the CA2, CA7, and ITM2C Gene Signatures for the Early Detection of Colorectal Cancer: A Comprehensive Analysis of RNA-Seq Data by Utilizing Machine Learning Algorithms.

机构信息

Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211004, India.

National Institute of Animal Biotechnology, Hyderabad 500032, India.

出版信息

Genes (Basel). 2023 Sep 22;14(10):1836. doi: 10.3390/genes14101836.

Abstract

Colorectal cancer affects the colon or rectum and is a common global health issue, with 1.1 million new cases occurring yearly. The study aimed to identify gene signatures for the early detection of CRC using machine learning (ML) algorithms utilizing gene expression data. The TCGA-CRC and GSE50760 datasets were pre-processed and subjected to feature selection using the LASSO method in combination with five ML algorithms: Adaboost, Random Forest (RF), Logistic Regression (LR), Gaussian Naive Bayes (GNB), and Support Vector Machine (SVM). The important features were further analyzed for gene expression, correlation, and survival analyses. Validation of the external dataset GSE142279 was also performed. The RF model had the best classification accuracy for both datasets. A feature selection process resulted in the identification of 12 candidate genes, which were subsequently reduced to 3 (CA2, CA7, and ITM2C) through gene expression and correlation analyses. These three genes achieved 100% accuracy in an external dataset. The AUC values for these genes were 99.24%, 100%, and 99.5%, respectively. The survival analysis showed a significant logrank -value of 0.044 for the final gene signatures. The analysis of tumor immunocyte infiltration showed a weak correlation with the expression of the gene signatures. CA2, CA7, and ITM2C can serve as gene signatures for the early detection of CRC and may provide valuable information for prognostic and therapeutic decision making. Further research is needed to fully understand the potential of these genes in the context of CRC.

摘要

结直肠癌影响结肠或直肠,是一个常见的全球健康问题,每年有 110 万新发病例。本研究旨在利用机器学习(ML)算法,通过基因表达数据,识别用于结直肠癌早期检测的基因特征。TCGA-CRC 和 GSE50760 数据集经过预处理,采用 LASSO 方法结合 5 种 ML 算法(Adaboost、随机森林(RF)、逻辑回归(LR)、高斯朴素贝叶斯(GNB)和支持向量机(SVM))进行特征选择。进一步对重要特征进行基因表达、相关性和生存分析。还对外部数据集 GSE142279 进行了验证。RF 模型对两个数据集的分类准确率最高。通过特征选择过程,确定了 12 个候选基因,通过基因表达和相关性分析进一步减少到 3 个(CA2、CA7 和 ITM2C)。这三个基因在外部数据集达到了 100%的准确率。这些基因的 AUC 值分别为 99.24%、100%和 99.5%。生存分析显示最终基因特征的对数秩检验值为 0.044。肿瘤免疫细胞浸润分析显示与基因特征的表达有微弱相关性。CA2、CA7 和 ITM2C 可作为结直肠癌早期检测的基因特征,可为预后和治疗决策提供有价值的信息。需要进一步研究以充分了解这些基因在结直肠癌中的潜在作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0794/10606805/6744d0202950/genes-14-01836-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验