Suppr超能文献

通过深度学习解码邻近氨基酸对 ESI-MS 强度输出的影响。

Decoding the impact of neighboring amino acids on ESI-MS intensity output through deep learning.

机构信息

Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, Aalborg 9220, Denmark.

Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, Aalborg 9220, Denmark..

出版信息

J Proteomics. 2024 Oct 30;309:105322. doi: 10.1016/j.jprot.2024.105322. Epub 2024 Sep 26.

Abstract

Peptide-level quantification using mass spectrometry (MS) is no trivial task as the physicochemical properties affect both response and detectability. The specific amino acid (AA) sequence affects these properties, however the connection between sequence and intensity output remains poorly understood. In this work, we explore combinations of amino acid pairs (i.e., dimer motifs) to determine a potential relationship between the local amino acid environment and MS1 intensity. For this purpose, a deep learning (DL) model, consisting of an encoder-decoder with an attention mechanism, was built. The attention mechanism allowed to identify the most relevant motifs. Specific patterns were consistently observed where a bulky/aromatic and hydrophobic AA followed by a cationic AA as well as consecutive bulky/aromatic and hydrophobic AAs were found important for the prediction of the MS1 intensity. Correlating attention weights to mean MS1 intensities revealed that some important motifs, particularly containing Trp, His, and Cys, were linked with low responding peptides whereas motifs containing Lys and most bulky hydrophobic AAs were often associated with high responding peptides. Moreover, Asn-Gly was associated with low response. The model predicts MS1 response with a mean average percentage error of ∼11 % and a Pearson correlation coefficient of ∼0.64. While dimer representation of peptide sequences did not improve predictive capacity compared to single AA representation in earlier work, this work adds valuable insight for a better understanding of peptide response in MS analysis. SIGNIFICANCE: Mass spectrometry is not inherently quantitative, and the response of a compound relies not only on its concentration but also on the molecular composition. For mass spectrometry-based analysis of peptides, such as in bottom-up proteomics, this directly implies that the response cannot be used directly to quantify individual peptides. Moreover, the dependency of the response on the amino acid sequence of individual peptides remains poorly understood. Using a deep learning model based on a recurrent neural network with an attention mechanism, we here investigate how the presence of dimer motifs within a peptide affects the MS1 response through the analysis of intended equimolar peptide pools comprising almost 200,000 unique peptides in total. Not only do we identify certain dimer classes and specific dimers that substantially affect the MS1 response, but the model is also able to predict peptide intensity with low error rates within the independent test subset. The findings not only improve our understanding of the link between sequence and response for peptides but also highlight the potential of utilizing deep learning for developing methods allowing for absolute, label-free peptide quantification.

摘要

使用质谱(MS)进行肽水平定量并非易事,因为理化性质会影响响应和检测能力。特定的氨基酸(AA)序列会影响这些性质,但序列与强度输出之间的联系仍知之甚少。在这项工作中,我们探索了氨基酸对(即二聚体基序)的组合,以确定局部氨基酸环境与 MS1 强度之间的潜在关系。为此,构建了一个深度学习(DL)模型,该模型由具有注意力机制的编码器-解码器组成。注意力机制允许识别最相关的基序。一致观察到特定的模式,其中大/芳基和疏水性 AA 后面跟着带正电荷的 AA,以及连续的大/芳基和疏水性 AA 被发现对 MS1 强度的预测很重要。将注意力权重与平均 MS1 强度相关联表明,一些重要的基序,特别是包含色氨酸、组氨酸和半胱氨酸的基序,与响应较低的肽有关,而包含赖氨酸和大多数大疏水性 AA 的基序通常与响应较高的肽有关。此外,天冬酰胺-甘氨酸与低响应有关。该模型预测 MS1 响应的平均平均百分比误差约为 11%,皮尔逊相关系数约为 0.64。虽然在早期工作中,肽序列的二聚体表示并没有比单个 AA 表示提高预测能力,但这项工作为更好地理解 MS 分析中的肽响应提供了有价值的见解。意义:质谱本身不是定量的,化合物的响应不仅取决于其浓度,还取决于其分子组成。对于基于质谱的肽分析,例如在自上而下的蛋白质组学中,这直接意味着不能直接使用响应来定量单个肽。此外,肽的序列对肽响应的依赖性仍知之甚少。我们使用基于具有注意力机制的递归神经网络的深度学习模型,通过分析总共包含近 200,000 个独特肽的等摩尔肽池,研究了肽中二聚体基序的存在如何通过分析影响 MS1 响应。我们不仅确定了某些二聚体类别和特定的二聚体,它们会极大地影响 MS1 响应,而且该模型还能够以低误差率预测独立测试子集内的肽强度。这些发现不仅提高了我们对肽序列与响应之间联系的理解,而且还突出了利用深度学习开发允许绝对、无标记肽定量的方法的潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验