亲爱的蛋白质组学标准倡议组织：基于深度学习的肽搜索引擎可实现蛋白质组学的全数据库搜索。

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

作者信息

He Qingzu, Li Xiang, Zhong Jinjin, Yang Gen, Han Jiahuai, Shuai Jianwei

机构信息

Department of Physics National Institute for Data Science in Health and Medicine Xiamen University Xiamen China.

Wenzhou Key Laboratory of Biophysics Wenzhou Institute University of Chinese Academy of Sciences Wenzhou Zhejiang China.

出版信息

Smart Med. 2024 Aug 27;3(3):e20240014. doi: 10.1002/SMMD.20240014. eCollection 2024 Sep.

DOI:10.1002/SMMD.20240014

PMID:39420951

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11425048/

Abstract

Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.

摘要

肽谱匹配是将质谱数据与肽序列相联系的过程。一个实验谱可以与数千个带有可变修饰的候选肽相匹配，这导致候选肽数量呈指数级增长。在有限时间内完成搜索是一项关键挑战。传统搜索通过限制肽质量误差和可变修饰来加快进程，但这限制了解释能力。为应对这一挑战，我们提出了Dear-PSM，一种支持全数据库搜索的肽搜索引擎。Dear-PSM不限制肽质量误差，将每个谱与数据库中的所有肽进行匹配，并将每个肽的可变修饰数量从传统的3 - 20个增加。利用倒排索引技术，Dear-PSM创建了一个实验谱的高性能索引表，并利用深度学习算法进行肽验证。通过这些技术，Dear-PSM在普通台式计算机上实现了比主流搜索引擎快7倍的速度突破，内存消耗显著减少了240倍。基准测试结果表明，在全数据库搜索模式下，Dear-PSM在处理使用各种仪器从不同物种收集的复杂质谱数据时，能够重现主流搜索引擎获得的90%以上的结果。此外，它还发现了大量新的肽和蛋白质。Dear-PSM已在GitHub仓库https://github.com/jianweishuai/Dear-PSM上公开发布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce88/11425048/fedc02e7cb29/SMMD-3-e20240014-g007.jpg

相似文献

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.亲爱的蛋白质组学标准倡议组织：基于深度学习的肽搜索引擎可实现蛋白质组学的全数据库搜索。

Smart Med. 2024 Aug 27;3(3):e20240014. doi: 10.1002/SMMD.20240014. eCollection 2024 Sep.

AttnPep: A Self-Attention-Based Deep Learning Method for Peptide Identification in Shotgun Proteomics.AttnPep：一种基于自注意力的深度学习方法，用于在鸟枪法蛋白质组学中鉴定肽段。

J Proteome Res. 2024 Feb 2;23(2):834-843. doi: 10.1021/acs.jproteome.3c00729. Epub 2024 Jan 22.

Dear-DIA: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics.Dear-DIA：深度自动编码器实现数据独立采集蛋白质组学的反褶积

Research (Wash D C). 2023 Jun 26;6:0179. doi: 10.34133/research.0179. eCollection 2023.

TIDD: tool-independent and data-dependent machine learning for peptide identification.TIDD：用于肽鉴定的与工具无关且与数据相关的机器学习。

BMC Bioinformatics. 2022 Mar 30;23(1):109. doi: 10.1186/s12859-022-04640-y.

Comparative database search engine analysis on massive tandem mass spectra of pork-based food products for halal proteomics.基于猪肉的食品清真蛋白质组学大规模串联质谱的比较数据库搜索引擎分析

J Proteomics. 2021 Jun 15;241:104240. doi: 10.1016/j.jprot.2021.104240. Epub 2021 Apr 21.

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.使用多个搜索引擎和明确的指标对蛋白质推断算法进行深入分析。

J Proteomics. 2017 Jan 6;150:170-182. doi: 10.1016/j.jprot.2016.08.002. Epub 2016 Aug 4.

psm_utils: A High-Level Python API for Parsing and Handling Peptide-Spectrum Matches and Proteomics Search Results.psm_utils：用于解析和处理肽段-质谱匹配结果及蛋白质组学搜索结果的高级Python应用程序编程接口。

J Proteome Res. 2023 Feb 3;22(2):557-560. doi: 10.1021/acs.jproteome.2c00609. Epub 2022 Dec 12.

Visualizing the agreement of peptide assignments between different search engines.可视化不同搜索引擎之间肽段分配的一致性。

J Mass Spectrom. 2020 Aug;55(8):e4471. doi: 10.1002/jms.4471. Epub 2019 Dec 3.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

MSblender: A probabilistic approach for integrating peptide identifications from multiple database search engines.MSblender：一种整合来自多个数据库搜索引擎的肽鉴定的概率方法。

J Proteome Res. 2011 Jul 1;10(7):2949-58. doi: 10.1021/pr2002116. Epub 2011 Apr 29.

引用本文的文献

E-SegNet: E-Shaped Structure Networks for Accurate 2D and 3D Medical Image Segmentation.E-SegNet：用于精确二维和三维医学图像分割的E形结构网络。

Research (Wash D C). 2025 Sep 3;8:0869. doi: 10.34133/research.0869. eCollection 2025.

A Machine Learning Model for Diagnosing Opportunistic Infections in HIV Patients: Broad Applicability Across Infection Types.一种用于诊断HIV患者机会性感染的机器学习模型：在各种感染类型中的广泛适用性。

J Cell Mol Med. 2025 Mar;29(6):e70497. doi: 10.1111/jcmm.70497.

本文引用的文献

SeFilter-DIA: Squeeze-and-Excitation Network for Filtering High-Confidence Peptides of Data-Independent Acquisition Proteomics.SeFilter-DIA：用于筛选数据非依赖型采集蛋白质组学中高可信度肽段的挤压激励网络。

Interdiscip Sci. 2024 Sep;16(3):579-592. doi: 10.1007/s12539-024-00611-4. Epub 2024 Mar 12.

J Proteome Res. 2024 Feb 2;23(2):834-843. doi: 10.1021/acs.jproteome.3c00729. Epub 2024 Jan 22.

Sage: An Open-Source Tool for Fast Proteomics Searching and Quantification at Scale.Sage：一种用于大规模快速蛋白质组学搜索和定量的开源工具。

J Proteome Res. 2023 Nov 3;22(11):3652-3659. doi: 10.1021/acs.jproteome.3c00486. Epub 2023 Oct 11.

Deciphering "the language of nature": A transformer-based language model for deleterious mutations in proteins.解读“自然语言”：一种基于Transformer的蛋白质有害突变语言模型。

Innovation (Camb). 2023 Jul 27;4(5):100487. doi: 10.1016/j.xinn.2023.100487. eCollection 2023 Sep 11.

Dear-DIA: Deep Autoencoder Enables Deconvolution of Data-Independent Acquisition Proteomics.Dear-DIA：深度自动编码器实现数据独立采集蛋白质组学的反褶积

Research (Wash D C). 2023 Jun 26;6:0179. doi: 10.34133/research.0179. eCollection 2023.

An Introduction to Mass Spectrometry-Based Proteomics.基于质谱的蛋白质组学简介。

J Proteome Res. 2023 Jul 7;22(7):2151-2171. doi: 10.1021/acs.jproteome.2c00838. Epub 2023 Jun 1.

Automated Enrichment of Phosphotyrosine Peptides for High-Throughput Proteomics.自动化富集磷酸化酪氨酸肽用于高通量蛋白质组学。

J Proteome Res. 2023 Jun 2;22(6):1868-1880. doi: 10.1021/acs.jproteome.2c00850. Epub 2023 Apr 25.

dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts.使用 FragPipe 和 DIA-NN 对低样本量进行深度蛋白质组学分析的 dia-PASEF 数据分析。

Nat Commun. 2022 Jul 8;13(1):3944. doi: 10.1038/s41467-022-31492-0.

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V.PatternLab V 实现简单、高效、彻底的鸟枪法蛋白质组学分析。

Nat Protoc. 2022 Jul;17(7):1553-1578. doi: 10.1038/s41596-022-00690-x. Epub 2022 Apr 11.

A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics.基于现代蛋白质组学采集策略的全面 LFQ 基准数据集。

Sci Data. 2022 Mar 30;9(1):126. doi: 10.1038/s41597-022-01216-6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

亲爱的蛋白质组学标准倡议组织：基于深度学习的肽搜索引擎可实现蛋白质组学的全数据库搜索。

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献