成对注意力机制：利用质量差异增强质谱的从头测序

Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra.

作者信息

Lapin Joel, Nilsson Alfred, Wilhelm Mathias, Käll Lukas

机构信息

Computational Mass Spectrometry, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany.

Science for Life Laboratory, KTH - Royal Institute of Technology, 171 65 Solna, Sweden.

出版信息

J Proteome Res. 2025 Jul 4;24(7):3722-3730. doi: 10.1021/acs.jproteome.5c00063. Epub 2025 Jun 2.

DOI:10.1021/acs.jproteome.5c00063

PMID:40454436

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12235698/

Abstract

A fundamental challenge in mass spectrometry-based proteomics is determining which peptide generated a given MS2 spectrum. Peptide sequencing typically relies on matching spectra against a known sequence database, which in some applications is not available. Deep learning-based de novo sequencing can address this limitation by directly predicting peptide sequences from MS2 data. We have seen the application of the transformer architecture to de novo sequencing produce state-of-the-art results on the so-called nine-species benchmark. In this study, we propose an improved transformer encoder inspired by the heuristics used in the manual interpretation of spectra. We modify the attention mechanism with a learned bias based on pairwise mass differences, termed Pairwise Attention (PA). Adding PA improves average peptide precision at 100% coverage by 12.7% (5.9 percentage points) over our base transformer on the original nine-species benchmark. We have also achieved a 7.4% increase over the previously published model Casanovo. Our MS2 encoding strategy is largely orthogonal to other transformer-based models encoding MS2 spectra, enabling straightforward integration into existing deep-learning approaches. Our results show that integrating domain-specific knowledge into transformers boosts de novo sequencing performance.

摘要

基于质谱的蛋白质组学中的一个基本挑战是确定哪个肽段产生了给定的二级质谱（MS2）谱图。肽段测序通常依赖于将谱图与已知序列数据库进行匹配，而在某些应用中该数据库并不存在。基于深度学习的从头测序可以通过直接从MS2数据预测肽段序列来解决这一限制。我们已经看到，将变换器（transformer）架构应用于从头测序在所谓的九物种基准测试中产生了最先进的结果。在本研究中，我们受光谱人工解读中使用的启发式方法的启发，提出了一种改进的变换器编码器。我们基于成对质量差异，用一种学习到的偏差修改注意力机制，称为成对注意力（PA）。在原始的九物种基准测试中，添加PA使覆盖率为100%时的平均肽段精度比我们的基础变换器提高了12.7%（5.9个百分点）。我们还比之前发表的模型Casanovo提高了7.4%。我们的MS2编码策略在很大程度上与其他基于变换器的编码MS2谱图的模型正交，能够直接集成到现有的深度学习方法中。我们的结果表明，将领域特定知识集成到变换器中可以提高从头测序性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7263/12235698/aee05d91f5cc/pr5c00063_0001.jpg

相似文献

Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra.成对注意力机制：利用质量差异增强质谱的从头测序

J Proteome Res. 2025 Jul 4;24(7):3722-3730. doi: 10.1021/acs.jproteome.5c00063. Epub 2025 Jun 2.

NovoBoard: A Comprehensive Framework for Evaluating the False Discovery Rate and Accuracy of De Novo Peptide Sequencing.NovoBoard：从头肽测序错误发现率和准确性的综合评估框架。

Mol Cell Proteomics. 2024 Nov;23(11):100849. doi: 10.1016/j.mcpro.2024.100849. Epub 2024 Sep 24.

A transformer model for de novo sequencing of data-independent acquisition mass spectrometry data.一种用于数据非依赖采集质谱数据从头测序的变压器模型。

Nat Methods. 2025 Jul;22(7):1447-1453. doi: 10.1038/s41592-025-02718-y. Epub 2025 Jul 1.

An algorithm for peptide de novo sequencing from a group of SILAC labeled MS/MS spectra.一种用于从一组稳定同位素标记的串联质谱（SILAC-labeled MS/MS）谱图中进行肽段从头测序的算法。

J Bioinform Comput Biol. 2025 Jun;23(3):2550007. doi: 10.1142/S0219720025500076. Epub 2025 Jul 15.

A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。

Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.评估胰高血糖素样肽-1受体激动剂（GLP-1 RAs）减肥效果的网状Meta分析的数量、质量及结果：一项范围综述

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Decoding the impact of neighboring amino acids on ESI-MS intensity output through deep learning.通过深度学习解码邻近氨基酸对 ESI-MS 强度输出的影响。

J Proteomics. 2024 Oct 30;309:105322. doi: 10.1016/j.jprot.2024.105322. Epub 2024 Sep 26.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

本文引用的文献

Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry.通过冷冻电子显微镜和质谱法同时进行多克隆抗体测序和表位定位

Elife. 2025 Apr 23;14:RP101322. doi: 10.7554/eLife.101322.

Deep Learning Methods for De Novo Peptide Sequencing.用于从头肽测序的深度学习方法

Mass Spectrom Rev. 2024 Nov 29. doi: 10.1002/mas.21919.

NovoBoard: A Comprehensive Framework for Evaluating the False Discovery Rate and Accuracy of De Novo Peptide Sequencing.NovoBoard：从头肽测序错误发现率和准确性的综合评估框架。

Mol Cell Proteomics. 2024 Nov;23(11):100849. doi: 10.1016/j.mcpro.2024.100849. Epub 2024 Sep 24.

Sequence-to-sequence translation from mass spectra to peptides with a transformer model.基于 Transformer 模型的从质谱到肽的序列到序列翻译。

Nat Commun. 2024 Jul 30;15(1):6427. doi: 10.1038/s41467-024-49731-x.

Introducing π-HelixNovo for practical large-scale de novo peptide sequencing.介绍用于实际大规模从头多肽测序的 π-螺旋 Novo。

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae021.

Uncovering Hidden Members and Functions of the Soil Microbiome Using Metaproteomics.利用宏蛋白质组学揭示土壤微生物组的隐藏成员和功能。

J Proteome Res. 2022 Aug 5;21(8):2023-2035. doi: 10.1021/acs.jproteome.2c00334. Epub 2022 Jul 6.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

Assembling the Community-Scale Discoverable Human Proteome.组装社区规模可发现的人类蛋白质组。

Cell Syst. 2018 Oct 24;7(4):412-421.e5. doi: 10.1016/j.cels.2018.08.004. Epub 2018 Aug 29.

De novo peptide sequencing by deep learning.通过深度学习进行从头肽测序。

Proc Natl Acad Sci U S A. 2017 Aug 1;114(31):8247-8252. doi: 10.1073/pnas.1705691114. Epub 2017 Jul 18.

Mass-spectrometric exploration of proteome structure and function.蛋白质组结构与功能的质谱探测。

Nature. 2016 Sep 15;537(7620):347-55. doi: 10.1038/nature19949.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

成对注意力机制：利用质量差异增强质谱的从头测序

Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献