SpecEncoder：用于蛋白质组学中精确肽段鉴定的深度度量学习。

SpecEncoder: deep metric learning for accurate peptide identification in proteomics.

机构信息

Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.

DOI:10.1093/bioinformatics/btae220

PMID:38940141

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11211836/

Abstract

MOTIVATION

Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification.

RESULTS

We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses.

AVAILABILITY AND IMPLEMENTATION

The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.

摘要

动机

串联质谱（MS/MS）是大规模蛋白质组分析的关键技术。蛋白质数据库搜索或光谱库搜索常用于从 MS/MS 光谱中鉴定肽，但由于重复光谱之间的实验变化以及不同肽之间的相似碎裂模式，可能会面临挑战。为了解决这个挑战，我们提出了 SpecEncoder，这是一种深度度量学习方法，通过将 MS/MS 光谱转换为潜在空间中的稳健和敏感的嵌入向量来解决这些挑战。SpecEncoder 模型还可以嵌入预测的肽 MS/MS 光谱，从而实现结合光谱库和蛋白质数据库搜索的混合搜索方法，用于肽鉴定。

结果

我们在三个大型人类蛋白质组学数据集上评估了 SpecEncoder，结果表明肽鉴定的一致性得到了提高。对于光谱库搜索，SpecEncoder 比 SpectraST 多鉴定 1%-2%的独特肽（和 PSM）。对于蛋白质数据库搜索，它比 Percolator 增强的 MSGF+多鉴定 6%-15%的独特肽。此外，当利用实验和预测光谱的组合库时，SpecEncoder 还可以鉴定 6%-12%的额外独特肽。与深度学习增强的方法（MSFragger 由 MSBooster 增强）相比，SpecEncoder 也可以鉴定更多的肽。这些结果表明 SpecEncoder 有可能增强蛋白质组数据分析中的肽鉴定。

可用性和实现

SpecEncoder 和肽鉴定的源代码和脚本可在 GitHub 上获得，网址为 https://github.com/lkytal/SpecEncoder。联系人：hatang@iu.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/700e/11211836/ca9124536c0c/btae220f1.jpg

相似文献

SpecEncoder: deep metric learning for accurate peptide identification in proteomics.SpecEncoder：用于蛋白质组学中精确肽段鉴定的深度度量学习。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.

MSBooster: improving peptide identification rates using deep learning-based features.MSBooster：基于深度学习的特征提高肽段鉴定率。

Nat Commun. 2023 Jul 27;14(1):4539. doi: 10.1038/s41467-023-40129-9.

Constructing a Tandem Mass Spectral Library for Forensic Ricin Identification.构建用于法医蓖麻毒素鉴定的串联质谱文库。

J Proteome Res. 2019 Nov 1;18(11):3926-3935. doi: 10.1021/acs.jproteome.9b00377. Epub 2019 Oct 14.

Enhanced peptide quantification using spectral count clustering and cluster abundance.使用谱计数聚类和聚类丰度进行增强的肽定量。

BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.

Mistle: bringing spectral library predictions to metaproteomics with an efficient search index.Mistle：利用高效搜索索引将光谱库预测引入宏蛋白质组学。

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad376.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

Extending the coverage of spectral libraries: a neighbor-based approach to predicting intensities of peptide fragmentation spectra.扩展光谱库的覆盖范围：一种基于邻近关系预测肽段碎裂谱强度的方法。

Proteomics. 2013 Mar;13(5):756-65. doi: 10.1002/pmic.201100670. Epub 2013 Feb 4.

Sensitive and Specific Spectral Library Searching with CompOmics Spectral Library Searching Tool and Percolator.使用 CompOmics 光谱库检索工具和 percolator 进行敏感和特异的光谱库检索。

J Proteome Res. 2022 May 6;21(5):1365-1370. doi: 10.1021/acs.jproteome.2c00075. Epub 2022 Apr 21.

Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.快速开放修改谱库搜索通过近似最近邻索引。

J Proteome Res. 2018 Oct 5;17(10):3463-3474. doi: 10.1021/acs.jproteome.8b00359. Epub 2018 Sep 13.

msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing.msCRUSH：基于局部敏感哈希的快速串联质谱聚类。

J Proteome Res. 2019 Jan 4;18(1):147-158. doi: 10.1021/acs.jproteome.8b00448. Epub 2018 Dec 14.

本文引用的文献

Accurate de novo peptide sequencing using fully convolutional neural networks.利用全卷积神经网络进行精确从头肽测序。

Nat Commun. 2023 Dec 2;14(1):7974. doi: 10.1038/s41467-023-43010-x.

MSBooster: improving peptide identification rates using deep learning-based features.MSBooster：基于深度学习的特征提高肽段鉴定率。

Nat Commun. 2023 Jul 27;14(1):4539. doi: 10.1038/s41467-023-40129-9.

Contrastive Learning-Based Embedder for the Representation of Tandem Mass Spectra.基于对比学习的串联质谱特征表示嵌入方法。

Anal Chem. 2023 May 23;95(20):7888-7896. doi: 10.1021/acs.analchem.3c00260. Epub 2023 May 12.

A learned embedding for efficient joint analysis of millions of mass spectra.一种用于高效联合分析数百万个质谱的深度学习嵌入方法。

Nat Methods. 2022 Jun;19(6):675-678. doi: 10.1038/s41592-022-01496-1. Epub 2022 May 30.

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.Spec2Vec：通过学习结构关系提高质谱相似性评分。

PLoS Comput Biol. 2021 Feb 16;17(2):e1008724. doi: 10.1371/journal.pcbi.1008724. eCollection 2021 Feb.

Deep learning embedder method and tool for mass spectra similarity search.用于质谱相似性搜索的深度学习嵌入器方法和工具。

J Proteomics. 2021 Feb 10;232:104070. doi: 10.1016/j.jprot.2020.104070. Epub 2020 Dec 8.

Full-Spectrum Prediction of Peptides Tandem Mass Spectra using Deep Neural Network.使用深度神经网络进行肽串联质谱的全谱预测。

Anal Chem. 2020 Mar 17;92(6):4275-4283. doi: 10.1021/acs.analchem.9b04867. Epub 2020 Feb 25.

Expanding the Use of Spectral Libraries in Proteomics.拓展光谱库在蛋白质组学中的应用。

J Proteome Res. 2018 Dec 7;17(12):4051-4060. doi: 10.1021/acs.jproteome.8b00485. Epub 2018 Oct 11.

Assembling the Community-Scale Discoverable Human Proteome.组装社区规模可发现的人类蛋白质组。

Cell Syst. 2018 Oct 24;7(4):412-421.e5. doi: 10.1016/j.cels.2018.08.004. Epub 2018 Aug 29.

Extending a Tandem Mass Spectral Library to Include MS Spectra of Fragment Ions Produced In-Source and MS Spectra.将串联质谱文库扩展到包括在源内产生的碎片离子的 MS 谱和 MS 谱。

J Am Soc Mass Spectrom. 2017 Nov;28(11):2280-2287. doi: 10.1007/s13361-017-1748-2. Epub 2017 Jul 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SpecEncoder：用于蛋白质组学中精确肽段鉴定的深度度量学习。

SpecEncoder: deep metric learning for accurate peptide identification in proteomics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献