Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany.
Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany.
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad376.
Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics.
In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes.
Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.
深度学习已成为串联质谱驱动蛋白质组学的前沿技术,对肽片段的真实预测比以往任何时候都更加可行。尽管如此,目前谱预测主要用于验证数据库搜索结果或用于受限的搜索空间。完全预测的光谱库尚未有效地适应经常出现在宏蛋白质组学或蛋白质基因组学中的大型搜索空间问题。
在这项研究中,我们展示了一个使用 Prosit 对两个常见宏蛋白质组进行光谱库预测的工作流程,并实现了一种索引和搜索算法 Mistle,以有效地在库内识别实验质谱。因此,该工作流程模拟了经典的蛋白质序列数据库搜索,使用蛋白质消化,但在中间步骤中从光谱预测构建可搜索索引。我们将 Mistle 与流行的搜索引擎进行了比较,包括在光谱和数据库搜索级别上的比较,并提供了证据表明,这种方法比使用 MSFragger 的数据库搜索更准确。Mistle 在运行时间方面优于其他光谱库搜索引擎,并证明在内存效率方面非常出色,RAM 使用量减少了 4 到 22 倍。这使得 Mistle 可以普遍适用于大型搜索空间,例如覆盖各种微生物组的综合序列数据库。
Mistle 可在 GitHub 上免费获得,网址为 https://github.com/BAMeScience/Mistle。