Suppr超能文献

宏基因组样本的大规模分类:经典机器学习技术与新型脑启发式高维计算方法的比较分析

Large-scale classification of metagenomic samples: a comparative analysis of classical machine learning techniques vs a novel brain-inspired hyperdimensional computing approach.

作者信息

Joshi Jayadev, Cumbo Fabio, Blankenberg Daniel

机构信息

Center for Computational Life Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.

Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA.

出版信息

bioRxiv. 2025 Jul 7:2025.07.06.663394. doi: 10.1101/2025.07.06.663394.

Abstract

Classical machine learning techniques have revolutionized bioinformatics, enabling researchers to extract knowledge from complex biological data. However, these techniques often struggle with high-dimensional data, where the increasing number of features leads to decreased performance, also affecting models accuracy. To address this problem, we explore hyperdimensional computing (HDC), an emerging brain-inspired computational paradigm that leverages high-dimensional vectors and simple arithmetic operations to represent and manipulate complex patterns, as an alternative approach in the context of supervised machine learning. In this work, we present a comprehensive comparative analysis of HDC against established machine learning techniques across a range of classification tasks. As a representative use case, we focus on classifying heterogeneous metagenomic samples based on their quantitative microbial profiles, using publicly available microbiome datasets. Our results demonstrate that HDC achieves comparable, and in some cases, superior classification accuracy to classical methods. Furthermore, our findings highlight the potential of HDC for improved computational efficiency, particularly when dealing with large-scale datasets, suggesting the HDC-based classifier as a promising tool for bioinformatics research, particularly in areas characterized by high-dimensional data. We also offer a Galaxy powered toolset to analyze your own datasets and generate reproducible workflows and adopt these methods in your own research with ease. Our investigation into the application of a HDC-based supervised machine learning technique for classifying microbial profiles in metagenomic samples yielded promising results, demonstrating the potential of this novel computational paradigm to complement and, in some cases, surpass the performances of well established machine learning techniques.

摘要

经典机器学习技术革新了生物信息学,使研究人员能够从复杂的生物数据中提取知识。然而,这些技术在处理高维数据时常常遇到困难,其中特征数量的增加会导致性能下降,也会影响模型的准确性。为了解决这个问题,我们探索了超维计算(HDC),这是一种新兴的受大脑启发的计算范式,它利用高维向量和简单的算术运算来表示和处理复杂模式,作为监督机器学习背景下的一种替代方法。在这项工作中,我们针对一系列分类任务,对HDC与既定的机器学习技术进行了全面的比较分析。作为一个具有代表性的用例,我们使用公开可用的微生物组数据集,专注于根据其定量微生物谱对异源宏基因组样本进行分类。我们的结果表明,HDC在某些情况下实现了与经典方法相当甚至更高的分类准确率。此外,我们的研究结果突出了HDC在提高计算效率方面的潜力,特别是在处理大规模数据集时,这表明基于HDC的分类器是生物信息学研究的一个有前途的工具,尤其是在以高维数据为特征的领域。我们还提供了一个由Galaxy支持的工具集,用于分析您自己的数据集并生成可重复的工作流程,并轻松地在您自己的研究中采用这些方法。我们对基于HDC的监督机器学习技术在宏基因组样本中微生物谱分类的应用研究取得了有希望的结果,证明了这种新颖的计算范式在补充并在某些情况下超越成熟机器学习技术性能方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bfa/12265723/fcc445472d30/nihpp-2025.07.06.663394v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验