Suppr超能文献

基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

机构信息

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

出版信息

BMC Bioinformatics. 2022 Jul 20;23(1):287. doi: 10.1186/s12859-022-04833-5.

Abstract

BACKGROUND

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.

RESULTS

In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs.

CONCLUSIONS

Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data.

AVAILABILITY

Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

摘要

背景

质谱分析是蛋白质组学领域的一项重要实验技术。然而,某些质谱数据分析面临着两个挑战的结合:首先,即使是单个实验也会产生大量多维原始数据;其次,感兴趣的信号不是单个峰,而是跨越不同维度的峰模式。质谱数据的快速增长增加了对可扩展解决方案的需求。此外,现有的信号检测方法通常依赖于对信号特性的强假设。

结果

本研究表明,局部敏感哈希能够在大规模质谱原始数据中进行信号分类。通过适当选择算法参数,可以平衡假阳性和假阴性率。在合成数据上,与强度阈值方法相比,获得了优越的性能。真实数据可以在不丢失相关信息的情况下进行强力压缩。我们的实现支持多达 32 个线程的扩展,并支持 GPU 加速。

结论

局部敏感哈希是一种在质谱原始数据中进行信号分类的理想方法。

可用性

生成的数据和代码可在 https://github.com/hildebrandtlab/mzBucket 上获取。原始数据可在 https://zenodo.org/record/5036526 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55ba/9301846/b22a67023a2b/12859_2022_4833_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验