• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Mistle:利用高效搜索索引将光谱库预测引入宏蛋白质组学。

Mistle: bringing spectral library predictions to metaproteomics with an efficient search index.

机构信息

Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany.

Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany.

出版信息

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad376.

DOI:10.1093/bioinformatics/btad376
PMID:37294786
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10313348/
Abstract

MOTIVATION

Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics.

RESULTS

In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes.

AVAILABILITY AND IMPLEMENTATION

Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.

摘要

动机

深度学习已成为串联质谱驱动蛋白质组学的前沿技术,对肽片段的真实预测比以往任何时候都更加可行。尽管如此,目前谱预测主要用于验证数据库搜索结果或用于受限的搜索空间。完全预测的光谱库尚未有效地适应经常出现在宏蛋白质组学或蛋白质基因组学中的大型搜索空间问题。

结果

在这项研究中,我们展示了一个使用 Prosit 对两个常见宏蛋白质组进行光谱库预测的工作流程,并实现了一种索引和搜索算法 Mistle,以有效地在库内识别实验质谱。因此,该工作流程模拟了经典的蛋白质序列数据库搜索,使用蛋白质消化,但在中间步骤中从光谱预测构建可搜索索引。我们将 Mistle 与流行的搜索引擎进行了比较,包括在光谱和数据库搜索级别上的比较,并提供了证据表明,这种方法比使用 MSFragger 的数据库搜索更准确。Mistle 在运行时间方面优于其他光谱库搜索引擎,并证明在内存效率方面非常出色,RAM 使用量减少了 4 到 22 倍。这使得 Mistle 可以普遍适用于大型搜索空间,例如覆盖各种微生物组的综合序列数据库。

可用性和实现

Mistle 可在 GitHub 上免费获得,网址为 https://github.com/BAMeScience/Mistle。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/256c141d858e/btad376f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/b17b45995e72/btad376f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/6a45e2dccfbb/btad376f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/e70d5fb2fbe0/btad376f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/6ca20eb15a1c/btad376f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/14eea84876c4/btad376f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/f013d3e80948/btad376f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/5313753905e1/btad376f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/f336351226ef/btad376f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/256c141d858e/btad376f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/b17b45995e72/btad376f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/6a45e2dccfbb/btad376f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/e70d5fb2fbe0/btad376f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/6ca20eb15a1c/btad376f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/14eea84876c4/btad376f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/f013d3e80948/btad376f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/5313753905e1/btad376f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/f336351226ef/btad376f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/faaf/10313348/256c141d858e/btad376f9.jpg

相似文献

1
Mistle: bringing spectral library predictions to metaproteomics with an efficient search index.Mistle:利用高效搜索索引将光谱库预测引入宏蛋白质组学。
Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad376.
2
Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.快速开放修改谱库搜索通过近似最近邻索引。
J Proteome Res. 2018 Oct 5;17(10):3463-3474. doi: 10.1021/acs.jproteome.8b00359. Epub 2018 Sep 13.
3
Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics.基于猪肉的食品清真蛋白质组学大规模串联质谱的比较数据库搜索引擎分析
J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21.
4
In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。
J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.
5
Accelerating open modification spectral library searching on tensor core in high-dimensional space.在高维空间的张量核上加速开放修改谱库搜索。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad404.
6
Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator.使用 CompOmics 光谱库检索工具和 percolator 进行敏感和特异的光谱库检索。
J Proteome Res. 2022 May 6;21(5):1365-1370. doi: 10.1021/acs.jproteome.2c00075. Epub 2022 Apr 21.
7
Spectral library searching in proteomics.蛋白质组学中的光谱库搜索
Proteomics. 2016 Mar;16(5):729-40. doi: 10.1002/pmic.201500296. Epub 2016 Feb 9.
8
SpecEncoder: deep metric learning for accurate peptide identification in proteomics.SpecEncoder:用于蛋白质组学中精确肽段鉴定的深度度量学习。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.
9
A Hybrid Spectral Library and Protein Sequence Database Search Strategy for Bottom-Up and Top-Down Proteomic Data Analysis.一种用于从头和靶向蛋白质组学数据分析的混合光谱库和蛋白质序列数据库搜索策略。
J Proteome Res. 2022 Nov 4;21(11):2609-2618. doi: 10.1021/acs.jproteome.2c00305. Epub 2022 Oct 7.
10
Scribe: Next Generation Library Searching for DDA Experiments.记录员:下一代库搜索 DDA 实验。
J Proteome Res. 2023 Feb 3;22(2):482-490. doi: 10.1021/acs.jproteome.2c00672. Epub 2023 Jan 25.

引用本文的文献

1
Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.使用截留法进行串联质谱分析时假发现率控制的评估
Nat Methods. 2025 Jun 16. doi: 10.1038/s41592-025-02719-x.
2
Benchmarking Spectral Library and Database Search Approaches for Metaproteomics Using a Ground-Truth Microbiome Dataset.使用真实微生物组数据集对宏蛋白质组学的光谱库和数据库搜索方法进行基准测试。
bioRxiv. 2025 May 20:2025.05.15.654320. doi: 10.1101/2025.05.15.654320.
3
The microbiologist's guide to metaproteomics.微生物学家的宏蛋白质组学指南。

本文引用的文献

1
Critical Assessment of MetaProteome Investigation (CAMPI): a multi-laboratory comparison of established workflows.关键评估元蛋白质组学调查 (CAMPI):已建立工作流程的多实验室比较。
Nat Commun. 2021 Dec 15;12(1):7305. doi: 10.1038/s41467-021-27542-8.
2
DeepLC can predict retention times for peptides that carry as-yet unseen modifications.DeepLC可以预测携带尚未见过的修饰的肽段的保留时间。
Nat Methods. 2021 Nov;18(11):1363-1369. doi: 10.1038/s41592-021-01301-5. Epub 2021 Oct 28.
3
Spectral Prediction Features as a Solution for the Search Space Size Problem in Proteogenomics.
Imeta. 2025 May 6;4(3):e70031. doi: 10.1002/imt2.70031. eCollection 2025 Jun.
4
FIORA: Local neighborhood-based prediction of compound mass spectra from single fragmentation events.FIORA:基于局部邻域的单碎片事件化合物质谱预测。
Nat Commun. 2025 Mar 7;16(1):2298. doi: 10.1038/s41467-025-57422-4.
5
Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.使用截留法对串联质谱分析中的错误发现率控制进行评估。
bioRxiv. 2025 Jan 21:2024.06.01.596967. doi: 10.1101/2024.06.01.596967.
6
Koina: Democratizing machine learning for proteomics research.科伊纳:蛋白质组学研究的机器学习民主化
bioRxiv. 2024 Jun 3:2024.06.01.596953. doi: 10.1101/2024.06.01.596953.
7
Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification.重新评分肽谱匹配:通过将肽性质预测器集成到肽鉴定中提高蛋白质组学性能。
Mol Cell Proteomics. 2024 Jul;23(7):100798. doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11.
光谱预测特征可解决蛋白质组学中搜索空间大小问题。
Mol Cell Proteomics. 2021;20:100076. doi: 10.1016/j.mcpro.2021.100076. Epub 2021 Apr 3.
4
A Fast and Memory-Efficient Spectral Library Search Algorithm Using Locality-Sensitive Hashing.基于局部敏感哈希的快速、低内存消耗光谱库搜索算法
Proteomics. 2020 Nov;20(21-22):e2000002. doi: 10.1002/pmic.202000002. Epub 2020 Jun 29.
5
Generating high quality libraries for DIA MS with empirically corrected peptide predictions.用经验校正后的肽预测生成高质量的 DIA-MS 文库。
Nat Commun. 2020 Mar 25;11(1):1548. doi: 10.1038/s41467-020-15346-1.
6
Following the community development of SIHUMIx - a new intestinal model for bioreactor use.随着用于生物反应器的新型肠道模型SIHUMIx的群落发展。
Gut Microbes. 2020 Jul 3;11(4):1116-1129. doi: 10.1080/19490976.2019.1702431. Epub 2020 Jan 10.
7
Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.Prosit:基于深度学习的肽串联质谱的蛋白质组范围预测。
Nat Methods. 2019 Jun;16(6):509-518. doi: 10.1038/s41592-019-0426-7. Epub 2019 May 27.
8
Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis.代谢组学与基因组学界面的挑战与前景:代谢组学数据分析最新进展概述。
Expert Rev Proteomics. 2019 May;16(5):375-390. doi: 10.1080/14789450.2019.1609944. Epub 2019 Apr 30.
9
DREAM-Yara: an exact read mapper for very large databases with short update time.DREAM-Yara:适用于具有较短更新时间的大型数据库的精确读取映射器。
Bioinformatics. 2018 Sep 1;34(17):i766-i772. doi: 10.1093/bioinformatics/bty567.
10
pDeep: Predicting MS/MS Spectra of Peptides with Deep Learning.pDeep:基于深度学习的肽段 MS/MS 谱预测。
Anal Chem. 2017 Dec 5;89(23):12690-12697. doi: 10.1021/acs.analchem.7b02566. Epub 2017 Nov 21.