Suppr超能文献

SpeCollate:基于深度跨模态相似性网络的质谱数据肽推断。

SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions.

机构信息

School of Computing & Information Sciences, Florida International University, Miami, FL, United States of America.

出版信息

PLoS One. 2021 Oct 29;16(10):e0259349. doi: 10.1371/journal.pone.0259349. eCollection 2021.

Abstract

Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.

摘要

从历史上看,数据库搜索算法一直是从质谱 (MS) 数据推断肽的事实上的标准。数据库搜索算法通过将理论肽转化为理论光谱,并将其与实验光谱进行匹配,从而推断出肽。启发式相似性评分函数用于将实验光谱与理论光谱进行匹配。然而,评分函数的启发式性质以及肽到理论光谱的简单转换,再加上丰度较低的肽的噪声质谱,可能会引入一系列不准确的情况。在本文中,我们设计并实现了一种名为 SpeCollate 的深度交叉模态相似性网络,该网络通过直接从标记的 MS 数据中学习实验光谱和肽之间的相似性函数来克服这些不准确的情况。SpeCollate 通过学习两者的固定大小嵌入来将光谱和肽转换为共享的欧几里得子空间。我们提出的深度学习网络在正例和负例的六重奏上进行训练,并结合我们定制的 SNAP 损失函数进行训练。在线最难负例挖掘用于选择适当的负例以实现最佳训练性能。我们使用来自 NIST 和 MassIVE 肽库的 480 万个六重奏来训练网络,并证明对于封闭搜索,SpeCollate 在 Crux 和 MSFragger 方面的性能更好,即在真实数据下以 1% FDR 识别的肽谱匹配 (PSM) 和独特肽的数量。SpeCollate 还鉴定了大量 Crux 或 MSFragger 未报告的肽。据我们所知,我们提出的 SpeCollate 是第一个可以确定基于 MS 的蛋白质组学中肽和质谱之间的交叉模态相似性的深度学习网络。我们相信 SpeCollate 在为基于 MS 的组学数据分析开发机器学习解决方案方面是一个重大进展。SpeCollate 可在 https://deepspecs.github.io/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/801f/8555789/23462b9c506d/pone.0259349.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验