Suppr超能文献

超高维空间中的超快质谱聚类分析

HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space.

机构信息

Department of Computer Science Engineering, University of California, San Diego, La Jolla, California 92093, United States.

Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, California 92093, United States.

出版信息

J Proteome Res. 2023 Jun 2;22(6):1639-1648. doi: 10.1021/acs.jproteome.2c00612. Epub 2023 May 11.

Abstract

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.

摘要

由于当前的 shotgun 蛋白质组学实验每小时可以产生数十千兆字节的质谱数据,因此处理这些海量数据的难度越来越大。谱聚类是一种通过合并高度相似的谱来最小化数据冗余,从而加速下游数据处理的有效方法。然而,由于最先进的谱聚类工具无法实现最佳的运行时,这只是将处理瓶颈转移了。在这项工作中,我们提出了一种快速的谱聚类工具 HyperSpec,它基于超高维计算 (HDC)。HDC 显示出有前途的聚类能力,同时只需要轻量级的二进制操作,具有很高的并行性,可以通过低级硬件架构进行优化,从而可以在图形处理单元上运行 HyperSpec,以实现极其高效的谱聚类性能。此外,HyperSpec 还包括优化的数据预处理模块,以减少谱预处理时间,这是谱聚类过程中的一个关键瓶颈。基于使用各种质谱数据集的实验,HyperSpec 产生的结果与最先进的谱聚类工具具有可比的聚类质量,同时实现了数量级的加速,将超过 2100 万条谱的聚类运行时间从 4 小时缩短到仅 24 分钟。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/28a8fa5d8e5f/pr2c00612_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验