Suppr超能文献

基于局部敏感哈希的快速、低内存消耗光谱库搜索算法

A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.

机构信息

School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA.

出版信息

Proteomics. 2020 Nov;20(21-22):e2000002. doi: 10.1002/pmic.202000002. Epub 2020 Jun 29.

Abstract

With the accumulation of MS/MS spectra collected in spectral libraries, the spectral library searching approach emerges as an important approach for peptide identification in proteomics, complementary to the commonly used protein database searching approach, in particular for the proteomic analyses of well-studied model organisms, such as human. Existing spectral library searching algorithms compare a query MS/MS spectrum with each spectrum in the library with matched precursor mass and charge state, which may become computationally intensive with the rapidly growing library size. Here, the software msSLASH, which implements a fast spectral library searching algorithm based on the Locality-Sensitive Hashing (LSH) technique, is presented. The algorithm first converts the library and query spectra into bit-strings using LSH functions, and then computes the similarity between the spectra with highly similar bit-string. Using the spectral library searching of large real-world MS/MS spectra datasets, it is demonstrated that the algorithm significantly reduced the number of spectral comparisons, and as a result, achieved 2-9X speedup in comparison with existing spectral library searching algorithm SpectraST. The spectral searching algorithm is implemented in C/C++, and is ready to be used in proteomic data analyses.

摘要

随着在光谱库中积累的 MS/MS 光谱数量的增加,光谱库搜索方法作为蛋白质组学中肽鉴定的一种重要方法,与常用的蛋白质数据库搜索方法相辅相成,特别是对于研究良好的模式生物(如人类)的蛋白质组学分析。现有的光谱库搜索算法将查询 MS/MS 光谱与库中每个具有匹配前体质量和电荷状态的光谱进行比较,随着库规模的快速增长,这可能会变得计算密集。这里介绍了一种名为 msSLASH 的软件,它实现了一种基于局部敏感哈希(LSH)技术的快速光谱库搜索算法。该算法首先使用 LSH 函数将库和查询光谱转换为位字符串,然后使用高度相似的位字符串计算光谱之间的相似性。通过对大型真实世界 MS/MS 光谱数据集的光谱搜索,证明该算法显著减少了光谱比较的数量,与现有的光谱库搜索算法 SpectraST 相比,速度提高了 2-9 倍。光谱搜索算法是用 C/C++ 实现的,准备用于蛋白质组学数据分析。

相似文献

2
5
Building and searching tandem mass spectral libraries for peptide identification.构建和搜索串联质谱文库以进行肽鉴定。
Mol Cell Proteomics. 2011 Dec;10(12):R111.008565. doi: 10.1074/mcp.R111.008565. Epub 2011 Sep 6.
8
Spectral library searching in proteomics.蛋白质组学中的光谱库搜索
Proteomics. 2016 Mar;16(5):729-40. doi: 10.1002/pmic.201500296. Epub 2016 Feb 9.

引用本文的文献

本文引用的文献

3
5
Assembling the Community-Scale Discoverable Human Proteome.组装社区规模可发现的人类蛋白质组。
Cell Syst. 2018 Oct 24;7(4):412-421.e5. doi: 10.1016/j.cels.2018.08.004. Epub 2018 Aug 29.
8
Identification of small molecules using accurate mass MS/MS search.利用精确质量 MS/MS 搜索鉴定小分子。
Mass Spectrom Rev. 2018 Jul;37(4):513-532. doi: 10.1002/mas.21535. Epub 2017 Apr 24.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验