Suppr超能文献

基于位置敏感哈希的方法能够高效、大规模地对高通量质谱原始数据中的信号进行分类。

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data.

机构信息

Institute of Computer Science, Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

Institute for Immunology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128, Mainz, Germany.

出版信息

BMC Bioinformatics. 2022 Jul 20;23(1):287. doi: 10.1186/s12859-022-04833-5.

Abstract

BACKGROUND

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.

RESULTS

In this study, it is shown that locality-sensitive hashing enables signal classification in mass spectrometry raw data at scale. Through appropriate choice of algorithm parameters it is possible to balance false-positive and false-negative rates. On synthetic data, a superior performance compared to an intensity thresholding approach was achieved. Real data could be strongly reduced without losing relevant information. Our implementation scaled out up to 32 threads and supports acceleration by GPUs.

CONCLUSIONS

Locality-sensitive hashing is a desirable approach for signal classification in mass spectrometry raw data.

AVAILABILITY

Generated data and code are available at https://github.com/hildebrandtlab/mzBucket . Raw data is available at https://zenodo.org/record/5036526 .

摘要

背景

质谱分析是蛋白质组学领域的一项重要实验技术。然而,某些质谱数据分析面临着两个挑战的结合:首先,即使是单个实验也会产生大量多维原始数据;其次,感兴趣的信号不是单个峰,而是跨越不同维度的峰模式。质谱数据的快速增长增加了对可扩展解决方案的需求。此外,现有的信号检测方法通常依赖于对信号特性的强假设。

结果

本研究表明,局部敏感哈希能够在大规模质谱原始数据中进行信号分类。通过适当选择算法参数,可以平衡假阳性和假阴性率。在合成数据上,与强度阈值方法相比,获得了优越的性能。真实数据可以在不丢失相关信息的情况下进行强力压缩。我们的实现支持多达 32 个线程的扩展,并支持 GPU 加速。

结论

局部敏感哈希是一种在质谱原始数据中进行信号分类的理想方法。

可用性

生成的数据和代码可在 https://github.com/hildebrandtlab/mzBucket 上获取。原始数据可在 https://zenodo.org/record/5036526 上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55ba/9301846/b22a67023a2b/12859_2022_4833_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验