基于机器学习的计算基因选择模型：综述、性能评估、开放问题及未来研究方向

Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.

作者信息

Mahendran Nivedhitha, Durai Raj Vincent P M, Srinivasan Kathiravan, Chang Chuan-Yu

机构信息

School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India.

Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan.

出版信息

Front Genet. 2020 Dec 10;11:603808. doi: 10.3389/fgene.2020.603808. eCollection 2020.

DOI:10.3389/fgene.2020.603808

PMID:33362861

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7758324/

Abstract

Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes' help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.

摘要

基因表达是通过生成必要的蛋白质来确定生物物理特征的过程。基因表达分两步进行，即翻译和转录。它是在酶的帮助下信息从DNA流向RNA的过程，最终产物是蛋白质和其他生物化学分子。许多技术可以从DNA或RNA中捕获基因表达。微阵列DNA就是这样一种技术。除了成本高昂外，微阵列DNA的主要问题是它以最小的样本量生成高维数据。处理如此庞大的数据集的问题在于学习模型会过度拟合。这个问题应该通过将数据源的维度大幅降低来解决。近年来，机器学习在基因组研究领域颇受欢迎。在文献中，已经讨论了许多基于机器学习的基因选择方法，这些方法旨在提高降维精度。本文对近年来基于机器学习的基因选择所做的各种工作进行了广泛综述，并对其性能进行了分析。该研究将各种特征选择算法分为监督学习、无监督学习和半监督学习。详细讨论了近年来为减少肿瘤诊断特征所做的工作。此外，还分析了文献中几种讨论方法的性能。本研究还列出并简要讨论了处理高维和小样本量数据时的开放问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d654/7758324/765eb5dbad29/fgene-11-603808-g001.jpg

相似文献

Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.

Front Genet. 2020 Dec 10;11:603808. doi: 10.3389/fgene.2020.603808. eCollection 2020.

Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection.

IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):971-989. doi: 10.1109/TCBB.2015.2478454. Epub 2015 Sep 14.

Filter versus wrapper gene selection approaches in DNA microarray domains.

Artif Intell Med. 2004 Jun;31(2):91-103. doi: 10.1016/j.artmed.2004.01.007.

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples.

PeerJ Comput Sci. 2021 Sep 16;7:e671. doi: 10.7717/peerj-cs.671. eCollection 2021.

Gene targeting in amyotrophic lateral sclerosis using causality-based feature selection and machine learning.

Mol Med. 2023 Jan 24;29(1):12. doi: 10.1186/s10020-023-00603-y.

Neurodynamics-driven holistic approaches to semi-supervised feature selection.

Neural Netw. 2023 Jan;157:377-386. doi: 10.1016/j.neunet.2022.10.029. Epub 2022 Nov 3.

Incorporating feature ranking and evolutionary methods for the classification of high-dimensional DNA microarray gene expression data.

Australas Med J. 2013 May 30;6(5):272-9. doi: 10.4066/AMJ.2013.1641. Print 2013.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Towards an Optimized Ensemble Feature Selection for DDoS Detection Using Both Supervised and Unsupervised Method.

Sensors (Basel). 2022 Nov 25;22(23):9144. doi: 10.3390/s22239144.

引用本文的文献

Predictive prioritization of genes significantly associated with biotic and abiotic stresses in maize using machine learning algorithms.

Front Plant Sci. 2025 Jun 19;16:1611784. doi: 10.3389/fpls.2025.1611784. eCollection 2025.

Identification of Endoplasmic Reticulum Stress-Related Genes in Acute Myocardial Infarction: A Bioinformatics Approach with Experimental Validation.

Biochem Genet. 2025 May 3. doi: 10.1007/s10528-025-11121-3.

A comparative analysis of gene expression profiling by statistical and machine learning approaches.

Bioinform Adv. 2024 Dec 18;5(1):vbae199. doi: 10.1093/bioadv/vbae199. eCollection 2025.

Machine Learning Meets Meta-Heuristics: Bald Eagle Search Optimization and Red Deer Optimization for Feature Selection in Type II Diabetes Diagnosis.

Bioengineering (Basel). 2024 Jul 29;11(8):766. doi: 10.3390/bioengineering11080766.

Elucidating the role of angiogenesis-related genes in colorectal cancer: a multi-omics analysis.

Front Oncol. 2024 Jun 19;14:1413273. doi: 10.3389/fonc.2024.1413273. eCollection 2024.

FPR1, as a Potential Biomarker of Diagnosis and Infliximab Therapy Responses for Crohn's Disease, is Related to Disease Activity, Inflammation and Macrophage Polarization.

J Inflamm Res. 2024 Jun 19;17:3949-3966. doi: 10.2147/JIR.S459819. eCollection 2024.

Cell-type specific inference from bulk RNA-sequencing data by integrating single cell reference profiles via EPIC-unmix.

bioRxiv. 2024 May 24:2024.05.23.595514. doi: 10.1101/2024.05.23.595514.

Application of Machine Learning in Predicting Hepatic Metastasis or Primary Site in Gastroenteropancreatic Neuroendocrine Tumors.

Curr Oncol. 2023 Oct 19;30(10):9244-9261. doi: 10.3390/curroncol30100668.

Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress.

Sci Rep. 2023 Sep 23;13(1):15899. doi: 10.1038/s41598-023-42984-4.

Unraveling the mechanisms underlying drug-induced cholestatic liver injury: identifying key genes using machine learning techniques on human in vitro data sets.

Arch Toxicol. 2023 Nov;97(11):2969-2981. doi: 10.1007/s00204-023-03583-4. Epub 2023 Aug 21.

本文引用的文献

G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays.

Artif Intell Med. 2020 Aug;108:101941. doi: 10.1016/j.artmed.2020.101941. Epub 2020 Aug 14.

A feature selection strategy for gene expression time series experiments with hidden Markov models.

PLoS One. 2019 Oct 10;14(10):e0223183. doi: 10.1371/journal.pone.0223183. eCollection 2019.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

Identification of potential biomarkers on microarray data using distributed gene selection approach.

Math Biosci. 2019 Sep;315:108230. doi: 10.1016/j.mbs.2019.108230. Epub 2019 Jul 18.

A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification.

Sci Rep. 2019 Jun 20;9(1):8978. doi: 10.1038/s41598-019-45223-x.

Peering Into the Black Box of Artificial Intelligence: Evaluation Metrics of Machine Learning Methods.

AJR Am J Roentgenol. 2019 Jan;212(1):38-43. doi: 10.2214/AJR.18.20224. Epub 2018 Oct 17.

An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.

Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.

Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods.

Med Biol Eng Comput. 2019 Jan;57(1):159-176. doi: 10.1007/s11517-018-1874-4. Epub 2018 Aug 1.

Semi-Supervised Maximum Discriminative Local Margin for Gene Selection.

Sci Rep. 2018 Jun 5;8(1):8619. doi: 10.1038/s41598-018-26806-6.

Cancer Characteristic Gene Selection via Sample Learning Based on Deep Sparse Filtering.

Sci Rep. 2018 May 29;8(1):8270. doi: 10.1038/s41598-018-26666-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的计算基因选择模型：综述、性能评估、开放问题及未来研究方向

Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献