基于 RNA-seq 数据的机器学习分析用于结直肠癌的诊断和预后预测。

Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer.

机构信息

Department of Computer Engineering, Faculty of Engineering, Ankara University, 06830 Ankara, Turkey.

Department of Analytical Chemistry, Faculty of Gülhane Pharmacy, University of Health Sciences, 06018 Ankara, Turkey.

出版信息

Sensors (Basel). 2023 Mar 13;23(6):3080. doi: 10.3390/s23063080.

DOI:10.3390/s23063080

PMID:36991790

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10052105/

Abstract

Data from omics studies have been used for prediction and classification of various diseases in biomedical and bioinformatics research. In recent years, Machine Learning (ML) algorithms have been used in many different fields related to healthcare systems, especially for disease prediction and classification tasks. Integration of molecular omics data with ML algorithms has offered a great opportunity to evaluate clinical data. RNA sequence (RNA-seq) analysis has been emerged as the gold standard for transcriptomics analysis. Currently, it is being used widely in clinical research. In our present work, RNA-seq data of extracellular vesicles (EV) from healthy and colon cancer patients are analyzed. Our aim is to develop models for prediction and classification of colon cancer stages. Five different canonical ML and Deep Learning (DL) classifiers are used to predict colon cancer of an individual with processed RNA-seq data. The classes of data are formed on the basis of both colon cancer stages and cancer presence (healthy or cancer). The canonical ML classifiers, which are k-Nearest Neighbor (kNN), Logistic Model Tree (LMT), Random Tree (RT), Random Committee (RC), and Random Forest (RF), are tested with both forms of the data. In addition, to compare the performance with canonical ML models, One-Dimensional Convolutional Neural Network (1-D CNN), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM) DL models are utilized. Hyper-parameter optimizations of DL models are constructed by using genetic meta-heuristic optimization algorithm (GA). The best accuracy in cancer prediction is obtained with RC, LMT, and RF canonical ML algorithms as 97.33%. However, RT and kNN show 95.33% performance. The best accuracy in cancer stage classification is achieved with RF as 97.33%. This result is followed by LMT, RC, kNN, and RT with 96.33%, 96%, 94.66%, and 94%, respectively. According to the results of the experiments with DL algorithms, the best accuracy in cancer prediction is obtained with 1-D CNN as 97.67%. BiLSTM and LSTM show 94.33% and 93.67% performance, respectively. In classification of the cancer stages, the best accuracy is achieved with BiLSTM as 98%. 1-D CNN and LSTM show 97% and 94.33% performance, respectively. The results reveal that both canonical ML and DL models may outperform each other for different numbers of features.

摘要

来自组学研究的数据已被用于生物医学和生物信息学研究中各种疾病的预测和分类。近年来，机器学习（ML）算法已被用于医疗保健系统的许多不同领域，尤其是用于疾病预测和分类任务。将分子组学数据与 ML 算法集成提供了评估临床数据的绝佳机会。RNA 序列（RNA-seq）分析已成为转录组学分析的金标准。目前，它在临床研究中得到了广泛应用。在我们目前的工作中，分析了来自健康和结肠癌患者的细胞外囊泡（EV）的 RNA-seq 数据。我们的目标是开发用于预测和分类结肠癌阶段的模型。使用五种不同的经典机器学习和深度学习（DL）分类器来预测个体的结肠癌，使用经过处理的 RNA-seq 数据。数据类别是基于结肠癌阶段和癌症存在（健康或癌症）形成的。经典的机器学习分类器，包括 k-最近邻（kNN）、Logistic 模型树（LMT）、随机树（RT）、随机委员会（RC）和随机森林（RF），都使用这两种数据形式进行了测试。此外，为了与经典 ML 模型进行比较，使用一维卷积神经网络（1-D CNN）、长短期记忆（LSTM）和双向 LSTM（BiLSTM）DL 模型进行了测试。通过遗传元启发式优化算法（GA）对 DL 模型的超参数优化进行构建。RC、LMT 和 RF 经典 ML 算法在癌症预测中的最佳准确率为 97.33%。然而，RT 和 kNN 的性能分别为 95.33%和 94.66%。RF 在癌症阶段分类中的最佳准确率为 97.33%。其次是 LMT、RC、kNN 和 RT，准确率分别为 96.33%、96%、94.66%和 94%。根据与 DL 算法的实验结果，1-D CNN 在癌症预测中的最佳准确率为 97.67%。BiLSTM 和 LSTM 的性能分别为 94.33%和 93.67%。在癌症阶段的分类中，BiLSTM 的最佳准确率为 98%。1-D CNN 和 LSTM 的准确率分别为 97%和 94.33%。结果表明，对于不同数量的特征，经典 ML 和 DL 模型可能会相互超越。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e80/10052105/f280ab827f0b/sensors-23-03080-g001.jpg

相似文献

Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer.

Sensors (Basel). 2023 Mar 13;23(6):3080. doi: 10.3390/s23063080.

DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.

BMC Bioinformatics. 2022 Jan 6;23(1):17. doi: 10.1186/s12859-021-04527-4.

Machine Learning Classifiers for Twitter Surveillance of Vaping: Comparative Machine Learning Study.

J Med Internet Res. 2020 Aug 12;22(8):e17478. doi: 10.2196/17478.

Development of an efficient novel method for coronary artery disease prediction using machine learning and deep learning techniques.

Technol Health Care. 2024;32(6):4545-4569. doi: 10.3233/THC-240740.

Current-Visit and Next-Visit Prediction for Fatty Liver Disease With a Large-Scale Dataset: Model Development and Performance Comparison.

JMIR Med Inform. 2021 Aug 12;9(8):e26398. doi: 10.2196/26398.

EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction.

BMC Bioinformatics. 2022 Jun 8;23(1):221. doi: 10.1186/s12859-022-04756-1.

Artificial Intelligence Algorithm-Based Economic Denial of Sustainability Attack Detection Systems: Cloud Computing Environments.

Sensors (Basel). 2022 Jun 21;22(13):4685. doi: 10.3390/s22134685.

CRlncRC: a machine learning-based method for cancer-related long noncoding RNA identification using integrated features.

BMC Med Genomics. 2018 Dec 31;11(Suppl 6):120. doi: 10.1186/s12920-018-0436-9.

Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.

Comput Methods Programs Biomed. 2014 Mar;113(3):792-808. doi: 10.1016/j.cmpb.2014.01.001. Epub 2014 Jan 10.

Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning.

BMC Cancer. 2021 Aug 27;21(1):962. doi: 10.1186/s12885-021-08704-9.

引用本文的文献

Integrating miRNA profiling and machine learning for improved prostate cancer diagnosis.

Sci Rep. 2025 Aug 20;15(1):30477. doi: 10.1038/s41598-025-99754-7.

Thyroid disease classification using generative adversarial networks and Kolmogorov-Arnold network for three-class classification.

BMC Med Inform Decis Mak. 2025 Jul 31;25(1):284. doi: 10.1186/s12911-025-03014-7.

Machine learning driven dashboard for chronic myeloid leukemia prediction using protein sequences.

PLoS One. 2025 Jun 18;20(6):e0321761. doi: 10.1371/journal.pone.0321761. eCollection 2025.

Advanced machine learning framework for enhancing breast cancer diagnostics through transcriptomic profiling.

Discov Oncol. 2025 Mar 17;16(1):334. doi: 10.1007/s12672-025-02111-3.

Development of machine learning models for the prediction of the skin sensitization potential of cosmetic compounds.

PeerJ. 2024 Dec 13;12:e18672. doi: 10.7717/peerj.18672. eCollection 2024.

Improving platelet-RNA-based diagnostics: a comparative analysis of machine learning models for cancer detection and multiclass classification.

Mol Oncol. 2024 Nov;18(11):2743-2754. doi: 10.1002/1878-0261.13689. Epub 2024 Jun 17.

Classification of Long Non-Coding RNAs s Between Early and Late Stage of Liver Cancers From Non-coding RNA Profiles Using Machine-Learning Approach.

Bioinform Biol Insights. 2024 Jun 5;18:11779322241258586. doi: 10.1177/11779322241258586. eCollection 2024.

Computational approaches in rheumatic diseases - Deciphering complex spatio-temporal cell interactions.

Comput Struct Biotechnol J. 2023 Aug 6;21:4009-4020. doi: 10.1016/j.csbj.2023.08.005. eCollection 2023.

本文引用的文献

A Robust Framework for Data Generative and Heart Disease Prediction Based on Efficient Deep Learning Models.

Diagnostics (Basel). 2022 Nov 22;12(12):2899. doi: 10.3390/diagnostics12122899.

Personalized Deep Bi-LSTM RNN Based Model for Pain Intensity Classification Using EDA Signal.

Sensors (Basel). 2022 Oct 22;22(21):8087. doi: 10.3390/s22218087.

Cervical Cancer Diagnosis Using an Integrated System of Principal Component Analysis, Genetic Algorithm, and Multilayer Perceptron.

Healthcare (Basel). 2022 Oct 11;10(10):2002. doi: 10.3390/healthcare10102002.

Using Deep Neural Networks for Human Fall Detection Based on Pose Estimation.

Sensors (Basel). 2022 Jun 16;22(12):4544. doi: 10.3390/s22124544.

A Machine Learning Approach to Parkinson's Disease Blood Transcriptomics.

Genes (Basel). 2022 Apr 21;13(5):727. doi: 10.3390/genes13050727.

Cancer Detection and Prediction Using Genetic Algorithms.

Comput Intell Neurosci. 2022 May 16;2022:1871841. doi: 10.1155/2022/1871841. eCollection 2022.

Classification of Ear Imagery Database using Bayesian Optimization based on CNN-LSTM Architecture.

J Digit Imaging. 2022 Aug;35(4):947-961. doi: 10.1007/s10278-022-00617-8. Epub 2022 Mar 16.

Combining Genetic Algorithms and SVM for Breast Cancer Diagnosis Using Infrared Thermography.

Sensors (Basel). 2021 Jul 14;21(14):4802. doi: 10.3390/s21144802.

One-dimensional convolutional neural network-based active feature extraction for fault detection and diagnosis of industrial processes and its understanding via visualization.

ISA Trans. 2022 Mar;122:424-443. doi: 10.1016/j.isatra.2021.04.042. Epub 2021 May 7.

ChrNet: A re-trainable chromosome-based 1D convolutional neural network for predicting immune cell types.

Genomics. 2021 Jul;113(4):2023-2031. doi: 10.1016/j.ygeno.2021.04.037. Epub 2021 Apr 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 RNA-seq 数据的机器学习分析用于结直肠癌的诊断和预后预测。

Machine Learning Analysis of RNA-seq Data for Diagnostic and Prognostic Prediction of Colon Cancer.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献