• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超高维空间中的超快质谱聚类分析

HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space.

机构信息

Department of Computer Science Engineering, University of California, San Diego, La Jolla, California 92093, United States.

Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, California 92093, United States.

出版信息

J Proteome Res. 2023 Jun 2;22(6):1639-1648. doi: 10.1021/acs.jproteome.2c00612. Epub 2023 May 11.

DOI:10.1021/acs.jproteome.2c00612
PMID:37166120
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10243109/
Abstract

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.

摘要

由于当前的 shotgun 蛋白质组学实验每小时可以产生数十千兆字节的质谱数据,因此处理这些海量数据的难度越来越大。谱聚类是一种通过合并高度相似的谱来最小化数据冗余,从而加速下游数据处理的有效方法。然而,由于最先进的谱聚类工具无法实现最佳的运行时,这只是将处理瓶颈转移了。在这项工作中,我们提出了一种快速的谱聚类工具 HyperSpec,它基于超高维计算 (HDC)。HDC 显示出有前途的聚类能力,同时只需要轻量级的二进制操作,具有很高的并行性,可以通过低级硬件架构进行优化,从而可以在图形处理单元上运行 HyperSpec,以实现极其高效的谱聚类性能。此外,HyperSpec 还包括优化的数据预处理模块,以减少谱预处理时间,这是谱聚类过程中的一个关键瓶颈。基于使用各种质谱数据集的实验,HyperSpec 产生的结果与最先进的谱聚类工具具有可比的聚类质量,同时实现了数量级的加速,将超过 2100 万条谱的聚类运行时间从 4 小时缩短到仅 24 分钟。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/05a88e6e3518/pr2c00612_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/28a8fa5d8e5f/pr2c00612_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/f21125a05c68/pr2c00612_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/d66f4f2d7b5a/pr2c00612_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/998041b6f4e9/pr2c00612_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/7bd83e978b21/pr2c00612_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/9c4142de7b40/pr2c00612_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/41a4e331c28b/pr2c00612_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/05a88e6e3518/pr2c00612_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/28a8fa5d8e5f/pr2c00612_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/f21125a05c68/pr2c00612_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/d66f4f2d7b5a/pr2c00612_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/998041b6f4e9/pr2c00612_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/7bd83e978b21/pr2c00612_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/9c4142de7b40/pr2c00612_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/41a4e331c28b/pr2c00612_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2887/10243109/05a88e6e3518/pr2c00612_0008.jpg

相似文献

1
HyperSpec: Ultrafast Mass Spectra Clustering in Hyperdimensional Space.超高维空间中的超快质谱聚类分析
J Proteome Res. 2023 Jun 2;22(6):1639-1648. doi: 10.1021/acs.jproteome.2c00612. Epub 2023 May 11.
2
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
3
ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。
J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.
4
MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics.MaRaCluster:一种用于鸟枪法蛋白质组学中片段谱聚类的片段稀有度度量方法。
J Proteome Res. 2016 Mar 4;15(3):713-20. doi: 10.1021/acs.jproteome.5b00749. Epub 2016 Jan 12.
5
Clustering millions of tandem mass spectra.对数百万个串联质谱进行聚类。
J Proteome Res. 2008 Jan;7(1):113-22. doi: 10.1021/pr070361e. Epub 2007 Dec 8.
6
Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。
J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.
7
Spectral Clustering Improves Label-Free Quantification of Low-Abundant Proteins.光谱聚类提高低丰度蛋白质的无标记定量分析。
J Proteome Res. 2019 Apr 5;18(4):1477-1485. doi: 10.1021/acs.jproteome.8b00377. Epub 2019 Mar 22.
8
Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures.利用线程级和指令级并行性,通过多核架构对质谱数据进行聚类。
Netw Model Anal Health Inform Bioinform. 2014 Apr;3:54. doi: 10.1007/s13721-014-0054-1.
9
Accelerating open modification spectral library searching on tensor core in high-dimensional space.在高维空间的张量核上加速开放修改谱库搜索。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad404.
10
Enhanced peptide quantification using spectral count clustering and cluster abundance.使用谱计数聚类和聚类丰度进行增强的肽定量。
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.

引用本文的文献

1
TopLib: Building and Searching Top-Down Mass Spectral Libraries for Proteoform Identification.TopLib:构建和搜索自上而下的质谱库以进行蛋白质异构体鉴定。
Anal Chem. 2025 Jun 10;97(22):11443-11453. doi: 10.1021/acs.analchem.4c06627. Epub 2025 May 29.
2
HDBind: encoding of molecular structure with hyperdimensional binary representations.HDBind:采用超维二进制表示法对分子结构进行编码。
Sci Rep. 2024 Nov 23;14(1):29025. doi: 10.1038/s41598-024-80009-w.
3
Hyperdimensional computing: A fast, robust, and interpretable paradigm for biological data.

本文引用的文献

1
A learned embedding for efficient joint analysis of millions of mass spectra.一种用于高效联合分析数百万个质谱的深度学习嵌入方法。
Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.
2
A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.蛋白质组学中共识谱生成方法的综合评价
J Proteome Res. 2022 Jun 3;21(6):1566-1574. doi: 10.1021/acs.jproteome.2c00069. Epub 2022 May 13.
3
ClusterSheep: A Graphics Processing Unit-Accelerated Software Tool for Large-Scale Clustering of Tandem Mass Spectra from Shotgun Proteomics.
超高维计算:一种用于生物数据的快速、稳健且可解释的范例。
PLoS Comput Biol. 2024 Sep 24;20(9):e1012426. doi: 10.1371/journal.pcbi.1012426. eCollection 2024 Sep.
4
HyperGen: Compact and Efficient Genome Sketching using Hyperdimensional Vectors.HyperGen:使用超维向量进行紧凑且高效的基因组草图绘制
Bioinformatics. 2024 Jul 16;40(7). doi: 10.1093/bioinformatics/btae452.
ClusterSheep:一种用于从 shotgun 蛋白质组学中大规模聚类串联质谱的图形处理单元加速软件工具。
J Proteome Res. 2021 Dec 3;20(12):5359-5367. doi: 10.1021/acs.jproteome.1c00485. Epub 2021 Nov 4.
4
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.PRIDE 数据库资源在 2022 年:一个基于质谱的蛋白质组学证据的中心。
Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.
5
Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.使用快速最近邻搜索的大规模串联质谱聚类
Rapid Commun Mass Spectrom. 2025 May;39 Suppl 1(Suppl 1):e9153. doi: 10.1002/rcm.9153. Epub 2021 Jul 20.
6
ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion.ThermoRawFileParser:模块化、可扩展且跨平台的 RAW 文件转换。
J Proteome Res. 2020 Jan 3;19(1):537-542. doi: 10.1021/acs.jproteome.9b00328. Epub 2019 Dec 6.
7
Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units.使用特征哈希和图形处理单元进行高分辨率质谱的极快速准确开放修饰谱库搜索。
J Proteome Res. 2019 Oct 4;18(10):3792-3799. doi: 10.1021/acs.jproteome.9b00291. Epub 2019 Aug 30.
8
msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH:基于局部敏感哈希的快速串联质谱聚类。
J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.
9
The PRIDE database and related tools and resources in 2019: improving support for quantification data.PRIDE 数据库及相关工具和资源在 2019 年的进展:提高定量数据支持。
Nucleic Acids Res. 2019 Jan 8;47(D1):D442-D450. doi: 10.1093/nar/gky1106.
10
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.串联质谱数据聚类算法的比较与评估。
J Proteome Res. 2017 Nov 3;16(11):4035-4044. doi: 10.1021/acs.jproteome.7b00427.