Suppr超能文献

成对注意力机制:利用质量差异增强质谱的从头测序

Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra.

作者信息

Lapin Joel, Nilsson Alfred, Wilhelm Mathias, Käll Lukas

机构信息

Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany.

Science for Life Laboratory, KTH - Royal Institute of Technology, 171 65 Solna, Sweden.

出版信息

J Proteome Res. 2025 Jul 4;24(7):3722-3730. doi: 10.1021/acs.jproteome.5c00063. Epub 2025 Jun 2.

Abstract

A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning-based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an improved transformer encoder inspired by the heuristics used in the manual interpretation of spectra. We modify the attention mechanism with a learned bias based on pairwise mass differences, termed Pairwise Attention (PA). Adding PA improves average peptide precision at 100% coverage by 12.7% (5.9 percentage points) over our base transformer on the original nine-species benchmark. We have also achieved a 7.4% increase over the previously published model Casanovo. Our MS2 encoding strategy is largely orthogonal to other transformer-based models encoding MS2 spectra, enabling straightforward integration into existing deep-learning approaches. Our results show that integrating domain-specific knowledge into transformers boosts de novo sequencing performance.

摘要

基于质谱的蛋白质组学中的一个基本挑战是确定哪个肽段产生了给定的二级质谱(MS2)谱图。肽段测序通常依赖于将谱图与已知序列数据库进行匹配,而在某些应用中该数据库并不存在。基于深度学习的从头测序可以通过直接从MS2数据预测肽段序列来解决这一限制。我们已经看到,将变换器(transformer)架构应用于从头测序在所谓的九物种基准测试中产生了最先进的结果。在本研究中,我们受光谱人工解读中使用的启发式方法的启发,提出了一种改进的变换器编码器。我们基于成对质量差异,用一种学习到的偏差修改注意力机制,称为成对注意力(PA)。在原始的九物种基准测试中,添加PA使覆盖率为100%时的平均肽段精度比我们的基础变换器提高了12.7%(5.9个百分点)。我们还比之前发表的模型Casanovo提高了7.4%。我们的MS2编码策略在很大程度上与其他基于变换器的编码MS2谱图的模型正交,能够直接集成到现有的深度学习方法中。我们的结果表明,将领域特定知识集成到变换器中可以提高从头测序性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7263/12235698/aee05d91f5cc/pr5c00063_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验