Suppr超能文献

混合深度学习方法提高低容量高维数据的分类。

Hybrid deep learning approach to improve classification of low-volume high-dimensional data.

机构信息

School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99164, USA.

School of Biological Sciences, Center for Reproductive Biology, Washington State University, Pullman, WA, 99164-4236, USA.

出版信息

BMC Bioinformatics. 2023 Nov 7;24(1):419. doi: 10.1186/s12859-023-05557-w.

Abstract

BACKGROUND

The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume).

METHOD

A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification.

RESULTS

The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains.

CONCLUSIONS

The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.

摘要

背景

机器学习分类方法的性能很大程度上依赖于特征的选择。在许多领域中,特征生成可能需要大量的人工劳动和领域知识,并且特征选择方法在高维数据集中无法很好地扩展。深度学习在特征生成方面取得了成功,但需要大量数据集才能实现高精度的分类。生物学领域通常具有这些挑战,即存在大量手工制作的特征(高维)和少量的训练数据(低量)。

方法

提出了一种混合学习方法,该方法首先在训练数据上训练深度网络,从深度网络中提取特征,然后使用这些特征重新表达数据,以供非深度学习方法输入,该方法经过训练可进行最终分类。

结果

系统地评估了该方法,以确定从深度学习网络中提取特征的最佳层以及偏好该方法的训练数据量阈值。来自多个领域的结果表明,这种混合方法优于独立的深度学习和非深度学习方法,特别是在低量、高维数据集上。多样化的数据集进一步支持了该方法在不同领域的稳健性。

结论

该混合方法结合了深度学习和非深度学习范式的优势,可在生物学领域中常见的高维、低量学习任务中实现高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/75be/10631218/235b9f98bade/12859_2023_5557_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验