Suppr超能文献

SpecEncoder:用于蛋白质组学中精确肽段鉴定的深度度量学习。

SpecEncoder: deep metric learning for accurate peptide identification in proteomics.

机构信息

Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University, IN 47408, United States.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i257-i265. doi: 10.1093/bioinformatics/btae220.

Abstract

MOTIVATION

Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification.

RESULTS

We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%-2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%-15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%-12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder's potential to enhance peptide identification for proteomic data analyses.

AVAILABILITY AND IMPLEMENTATION

The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.

摘要

动机

串联质谱(MS/MS)是大规模蛋白质组分析的关键技术。蛋白质数据库搜索或光谱库搜索常用于从 MS/MS 光谱中鉴定肽,但由于重复光谱之间的实验变化以及不同肽之间的相似碎裂模式,可能会面临挑战。为了解决这个挑战,我们提出了 SpecEncoder,这是一种深度度量学习方法,通过将 MS/MS 光谱转换为潜在空间中的稳健和敏感的嵌入向量来解决这些挑战。SpecEncoder 模型还可以嵌入预测的肽 MS/MS 光谱,从而实现结合光谱库和蛋白质数据库搜索的混合搜索方法,用于肽鉴定。

结果

我们在三个大型人类蛋白质组学数据集上评估了 SpecEncoder,结果表明肽鉴定的一致性得到了提高。对于光谱库搜索,SpecEncoder 比 SpectraST 多鉴定 1%-2%的独特肽(和 PSM)。对于蛋白质数据库搜索,它比 Percolator 增强的 MSGF+多鉴定 6%-15%的独特肽。此外,当利用实验和预测光谱的组合库时,SpecEncoder 还可以鉴定 6%-12%的额外独特肽。与深度学习增强的方法(MSFragger 由 MSBooster 增强)相比,SpecEncoder 也可以鉴定更多的肽。这些结果表明 SpecEncoder 有可能增强蛋白质组数据分析中的肽鉴定。

可用性和实现

SpecEncoder 和肽鉴定的源代码和脚本可在 GitHub 上获得,网址为 https://github.com/lkytal/SpecEncoder。联系人:hatang@iu.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/700e/11211836/ca9124536c0c/btae220f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验