混合深度学习方法提高低容量高维数据的分类。

Hybrid deep learning approach to improve classification of low-volume high-dimensional data.

机构信息

School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA.

School of Biological Sciences, Center for Reproductive Biology, Washington State University, Pullman, WA, 99164-4236, USA.

出版信息

BMC Bioinformatics. 2023 Nov 7;24(1):419. doi: 10.1186/s12859-023-05557-w.

DOI:10.1186/s12859-023-05557-w

PMID:37936066

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10631218/

Abstract

BACKGROUND

The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume).

METHOD

A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification.

RESULTS

The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains.

CONCLUSIONS

The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.

摘要

背景

机器学习分类方法的性能很大程度上依赖于特征的选择。在许多领域中，特征生成可能需要大量的人工劳动和领域知识，并且特征选择方法在高维数据集中无法很好地扩展。深度学习在特征生成方面取得了成功，但需要大量数据集才能实现高精度的分类。生物学领域通常具有这些挑战，即存在大量手工制作的特征（高维）和少量的训练数据（低量）。

方法

提出了一种混合学习方法，该方法首先在训练数据上训练深度网络，从深度网络中提取特征，然后使用这些特征重新表达数据，以供非深度学习方法输入，该方法经过训练可进行最终分类。

结果

系统地评估了该方法，以确定从深度学习网络中提取特征的最佳层以及偏好该方法的训练数据量阈值。来自多个领域的结果表明，这种混合方法优于独立的深度学习和非深度学习方法，特别是在低量、高维数据集上。多样化的数据集进一步支持了该方法在不同领域的稳健性。

结论

该混合方法结合了深度学习和非深度学习范式的优势，可在生物学领域中常见的高维、低量学习任务中实现高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75be/10631218/235b9f98bade/12859_2023_5557_Fig1_HTML.jpg

相似文献

Hybrid deep learning approach to improve classification of low-volume high-dimensional data.

BMC Bioinformatics. 2023 Nov 7;24(1):419. doi: 10.1186/s12859-023-05557-w.

Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.

Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.

Multilevel hybrid handcrafted feature extraction based depression recognition method using speech.

J Affect Disord. 2024 Nov 1;364:9-19. doi: 10.1016/j.jad.2024.08.002. Epub 2024 Aug 9.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

Brain tumor classification for MR images using transfer learning and fine-tuning.

Comput Med Imaging Graph. 2019 Jul;75:34-46. doi: 10.1016/j.compmedimag.2019.05.001. Epub 2019 May 18.

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification.

RNA Biol. 2024 Jan;21(1):1-12. doi: 10.1080/15476286.2024.2329451. Epub 2024 Mar 25.

Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach.

BMC Bioinformatics. 2021 Nov 30;22(1):575. doi: 10.1186/s12859-021-04491-z.

Cross-dataset transfer learning for motor imagery signal classification via multi-task learning and pre-training.

J Neural Eng. 2023 Oct 20;20(5). doi: 10.1088/1741-2552/acfe9c.

Hybrid Feature-Learning-Based PSO-PCA Feature Engineering Approach for Blood Cancer Classification.

Diagnostics (Basel). 2023 Aug 14;13(16):2672. doi: 10.3390/diagnostics13162672.

LwF-ECG: Learning-without-forgetting approach for electrocardiogram heartbeat classification based on memory with task selector.

Comput Biol Med. 2021 Oct;137:104807. doi: 10.1016/j.compbiomed.2021.104807. Epub 2021 Aug 27.

引用本文的文献

Deep learning based deconvolution methods: A systematic review.

Comput Struct Biotechnol J. 2025 Jun 11;27:2544-2565. doi: 10.1016/j.csbj.2025.05.038. eCollection 2025.

Hybrid time series and machine learning models for forecasting cardiovascular mortality in India: an age specific analysis.

BMC Public Health. 2025 Jun 10;25(1):2150. doi: 10.1186/s12889-025-23318-7.

Advancing precision oncology with AI-powered genomic analysis.

Front Pharmacol. 2025 Apr 30;16:1591696. doi: 10.3389/fphar.2025.1591696. eCollection 2025.

Breaking new ground: machine learning enhances survival forecasts in hypercapnic respiratory failure.

Front Med (Lausanne). 2025 Feb 20;12:1497651. doi: 10.3389/fmed.2025.1497651. eCollection 2025.

Deep-learning-ready RGB-depth images of seedling development.

Plant Methods. 2025 Feb 11;21(1):16. doi: 10.1186/s13007-025-01334-3.

A hybrid machine learning framework for functional annotation of mitochondrial glutathione transport and metabolism proteins in cancers.

BMC Bioinformatics. 2025 Feb 11;26(1):48. doi: 10.1186/s12859-025-06051-1.

本文引用的文献

MouseNet: A biologically constrained convolutional neural network model for the mouse visual cortex.

PLoS Comput Biol. 2022 Sep 6;18(9):e1010427. doi: 10.1371/journal.pcbi.1010427. eCollection 2022 Sep.

Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach.

BMC Bioinformatics. 2021 Nov 30;22(1):575. doi: 10.1186/s12859-021-04491-z.

Hybrid Deep-Learning and Machine-Learning Models for Predicting COVID-19.

Comput Intell Neurosci. 2021 Aug 3;2021:9996737. doi: 10.1155/2021/9996737. eCollection 2021.

Prediction and evaluation of plasma arc reforming of naphthalene using a hybrid machine learning model.

J Hazard Mater. 2021 Feb 15;404(Pt A):123965. doi: 10.1016/j.jhazmat.2020.123965. Epub 2020 Sep 16.

A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification.

Sci Rep. 2018 Nov 7;8(1):16477. doi: 10.1038/s41598-018-34833-6.

Machine learning for epigenetics and future medical applications.

Epigenetics. 2017 Jul 3;12(7):505-514. doi: 10.1080/15592294.2017.1329068. Epub 2017 May 19.

Applications of Deep Learning in Biomedicine.

Mol Pharm. 2016 May 2;13(5):1445-54. doi: 10.1021/acs.molpharmaceut.5b00982. Epub 2016 Mar 29.

Representation learning: a review and new perspectives.

IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.

Human tracking using convolutional neural networks.

IEEE Trans Neural Netw. 2010 Oct;21(10):1610-23. doi: 10.1109/TNN.2010.2066286. Epub 2010 Aug 30.

A fast learning algorithm for deep belief nets.

Neural Comput. 2006 Jul;18(7):1527-54. doi: 10.1162/neco.2006.18.7.1527.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

混合深度学习方法提高低容量高维数据的分类。

Hybrid deep learning approach to improve classification of low-volume high-dimensional data.

机构信息

出版信息

BACKGROUND

METHOD

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献