基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

机构信息

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

出版信息

BMC Bioinformatics. 2022 Jul 20;23(1):287. doi: 10.1186/s12859-022-04833-5.

DOI:10.1186/s12859-022-04833-5

PMID:35858828

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9301846/

Abstract

BACKGROUND

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.

RESULTS

In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs.

CONCLUSIONS

Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data.

AVAILABILITY

Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

摘要

背景

质谱分析是蛋白质组学领域的一项重要实验技术。然而，某些质谱数据分析面临着两个挑战的结合：首先，即使是单个实验也会产生大量多维原始数据；其次，感兴趣的信号不是单个峰，而是跨越不同维度的峰模式。质谱数据的快速增长增加了对可扩展解决方案的需求。此外，现有的信号检测方法通常依赖于对信号特性的强假设。

结果

本研究表明，局部敏感哈希能够在大规模质谱原始数据中进行信号分类。通过适当选择算法参数，可以平衡假阳性和假阴性率。在合成数据上，与强度阈值方法相比，获得了优越的性能。真实数据可以在不丢失相关信息的情况下进行强力压缩。我们的实现支持多达 32 个线程的扩展，并支持 GPU 加速。

结论

局部敏感哈希是一种在质谱原始数据中进行信号分类的理想方法。

可用性

生成的数据和代码可在 https://github.com/hildebrandtlab/mzBucket 上获取。原始数据可在 https://zenodo.org/record/5036526 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55ba/9301846/b22a67023a2b/12859_2022_4833_Fig1_HTML.jpg

相似文献

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。

BMC Bioinformatics. 2022 Jul 20;23(1):287. doi: 10.1186/s12859-022-04833-5.

msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH：基于局部敏感哈希的快速串联质谱聚类。

J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.

A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.基于局部敏感哈希的快速、低内存消耗光谱库搜索算法

Proteomics. 2020 Nov;20(21-22):e2000002. doi: 10.1002/pmic.202000002. Epub 2020 Jun 29.

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data.基于MapReduce的个性化局部敏感哈希算法在大规模数据相似性连接中的应用

Comput Intell Neurosci. 2015;2015:217216. doi: 10.1155/2015/217216. Epub 2015 Apr 30.

MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing.MS-REDUCE：一种用于减少大量质谱数据以进行高通量处理的超快速技术。

Bioinformatics. 2016 May 15;32(10):1518-26. doi: 10.1093/bioinformatics/btw023. Epub 2016 Jan 21.

Metagenomic binning through low-density hashing.基于低密度哈希的宏基因组 bin 划分。

Bioinformatics. 2019 Jan 15;35(2):219-226. doi: 10.1093/bioinformatics/bty611.

Efficient visualization of high-throughput targeted proteomics experiments: TAPIR.高通量靶向蛋白质组学实验的高效可视化：TAPIR

Bioinformatics. 2015 Jul 15;31(14):2415-7. doi: 10.1093/bioinformatics/btv152. Epub 2015 Mar 18.

BFF and cellhashR: analysis tools for accurate demultiplexing of cell hashing data.BFF 和 cellhashR：用于准确分析细胞哈希数据的分析工具。

Bioinformatics. 2022 May 13;38(10):2791-2801. doi: 10.1093/bioinformatics/btac213.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

A dynamic wavelet-based algorithm for pre-processing tandem mass spectrometry data.一种基于动态小波的串联质谱数据预处理算法。

Bioinformatics. 2010 Sep 15;26(18):2242-9. doi: 10.1093/bioinformatics/btq403. Epub 2010 Jul 13.

本文引用的文献

OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data.OpenTIMS、TimsPy 和 TimsR：轻松访问 timsTOF 原始数据

J Proteome Res. 2021 Apr 2;20(4):2122-2129. doi: 10.1021/acs.jproteome.0c00962. Epub 2021 Mar 16.

Emerging mass spectrometry-based proteomics methodologies for novel biomedical applications.新兴的基于质谱的蛋白质组学方法在新的生物医学中的应用。

Biochem Soc Trans. 2020 Oct 30;48(5):1953-1966. doi: 10.1042/BST20191091.

IsoSpec2: Ultrafast Fine Structure Calculator.IsoSpec2：超快精细结构计算器。

Anal Chem. 2020 Jul 21;92(14):9472-9475. doi: 10.1021/acs.analchem.0c00959. Epub 2020 Jun 24.

A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.基于局部敏感哈希的快速、低内存消耗光谱库搜索算法

Proteomics. 2020 Nov;20(21-22):e2000002. doi: 10.1002/pmic.202000002. Epub 2020 Jun 29.

MaxQuant Software for Ion Mobility Enhanced Shotgun Proteomics.MaxQuant 软件在离子淌度增强型 shotgun 蛋白质组学中的应用。

Mol Cell Proteomics. 2020 Jun;19(6):1058-1069. doi: 10.1074/mcp.TIR119.001720. Epub 2020 Mar 10.

DeepIso: A Deep Learning Model for Peptide Feature Detection from LC-MS map.DeepIso：一种从 LC-MS 图谱中检测肽特征的深度学习模型。

Sci Rep. 2019 Nov 20;9(1):17168. doi: 10.1038/s41598-019-52954-4.

MCtandem: an efficient tool for large-scale peptide identification on many integrated core (MIC) architecture.MCtandem：一种在许多集成核心 (MIC) 架构上进行大规模肽鉴定的高效工具。

BMC Bioinformatics. 2019 Jul 17;20(1):397. doi: 10.1186/s12859-019-2980-5.

msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH：基于局部敏感哈希的快速串联质谱聚类。

J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.

The European Bioinformatics Institute in 2018: tools, infrastructure and training.欧洲生物信息学研究所 2018 年：工具、基础设施和培训。

Nucleic Acids Res. 2019 Jan 8;47(D1):D15-D22. doi: 10.1093/nar/gky1124.

Online Parallel Accumulation-Serial Fragmentation (PASEF) with a Novel Trapped Ion Mobility Mass Spectrometer.在线平行累积-串联碎片化（PASEF）与新型离子阱离子淌度质谱联用。

Mol Cell Proteomics. 2018 Dec;17(12):2534-2545. doi: 10.1074/mcp.TIR118.000900. Epub 2018 Nov 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

AVAILABILITY

背景

结果

结论

可用性

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献