用机器学习减少定量质谱数据中的肽序列偏差。

Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning.

机构信息

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States.

Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.

出版信息

J Proteome Res. 2022 Jul 1;21(7):1771-1782. doi: 10.1021/acs.jproteome.2c00211. Epub 2022 Jun 13.

DOI:10.1021/acs.jproteome.2c00211

PMID:35696663

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9531543/

Abstract

Quantitative mass spectrometry measurements of peptides necessarily incorporate sequence-specific biases that reflect the behavior of the peptide during enzymatic digestion and liquid chromatography and in a mass spectrometer. These sequence-specific effects impair quantification accuracy, yielding peptide quantities that are systematically under- or overestimated. We provide empirical evidence for the existence of such biases, and we use a deep neural network, called Pepper, to automatically identify and reduce these biases. The model generalizes to new proteins and new runs within a related set of tandem mass spectrometry experiments, and the learned coefficients themselves reflect expected physicochemical properties of the corresponding peptide sequences. The resulting adjusted abundance measurements are more correlated with mRNA-based gene expression measurements than the unadjusted measurements. Pepper is suitable for data generated on a variety of mass spectrometry instruments and can be used with labeled or label-free approaches and with data-independent or data-dependent acquisition.

摘要

肽的定量质谱测量必然包含反映肽在酶解、液相色谱和质谱中行为的序列特异性偏差。这些序列特异性效应会损害定量准确性，导致肽的数量被系统地低估或高估。我们提供了存在这种偏差的经验证据，并使用称为 Pepper 的深度神经网络来自动识别和减少这些偏差。该模型可推广到新的蛋白质和同一组串联质谱实验中的新运行，并且学习到的系数本身反映了相应肽序列的预期物理化学性质。由此产生的调整后的丰度测量值与基于 mRNA 的基因表达测量值的相关性比未经调整的测量值更高。Pepper 适用于各种质谱仪器生成的数据，可以与标记或无标记方法以及数据独立或数据依赖的采集方法一起使用。

相似文献

Reducing Peptide Sequence Bias in Quantitative Mass Spectrometry Data with Machine Learning.用机器学习减少定量质谱数据中的肽序列偏差。

J Proteome Res. 2022 Jul 1;21(7):1771-1782. doi: 10.1021/acs.jproteome.2c00211. Epub 2022 Jun 13.

[Nonspecific adsorption evaluation and general minimization strategy in peptide analysis based on ultra-performance liquid chromatography-mass spectrometry].基于超高效液相色谱-质谱联用的肽分析中的非特异性吸附评估及通用最小化策略

Se Pu. 2022 Jul;40(7):616-624. doi: 10.3724/SP.J.1123.2021.12012.

LFAQ: Toward Unbiased Label-Free Absolute Protein Quantification by Predicting Peptide Quantitative Factors.LFAQ：通过预测肽定量因子实现无偏的无标记绝对蛋白质定量。

Anal Chem. 2019 Jan 15;91(2):1335-1343. doi: 10.1021/acs.analchem.8b03267. Epub 2018 Dec 21.

MS2CNN: predicting MS/MS spectrum based on protein sequence using deep convolutional neural networks.MS2CNN：基于深度卷积神经网络的蛋白质序列预测 MS/MS 谱。

BMC Genomics. 2019 Dec 24;20(Suppl 9):906. doi: 10.1186/s12864-019-6297-6.

The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results.APEX定量蛋白质组学工具：从液相色谱-串联质谱蛋白质组学结果生成蛋白质定量估计值。

BMC Bioinformatics. 2008 Dec 9;9:529. doi: 10.1186/1471-2105-9-529.

[Simultaneous determination of three allergic proteins in rice and products by high performance liquid chromatography-tandem mass spectrometry combined with stable isotope-labeled peptides].高效液相色谱-串联质谱联用结合稳定同位素标记肽段同时测定大米及其制品中的三种致敏蛋白

Se Pu. 2021 Dec;39(12):1314-1323. doi: 10.3724/SP.J.1123.2021.06039.

DbyDeep: Exploration of MS-Detectable Peptides via Deep Learning.DbyDeep：基于深度学习的 MS 可检测肽的探索。

Anal Chem. 2023 Aug 1;95(30):11193-11200. doi: 10.1021/acs.analchem.3c00460. Epub 2023 Jul 17.

Deep learning neural network tools for proteomics.深度学习神经网络工具在蛋白质组学中的应用。

Cell Rep Methods. 2021 May 17;1(2):100003. doi: 10.1016/j.crmeth.2021.100003. eCollection 2021 Jun 21.

Quantitative detection of ricin in beverages using trypsin/Glu-C tandem digestion coupled with ultra-high-pressure liquid chromatography-tandem mass spectrometry.采用胰蛋白酶/胃蛋白酶串联消化结合超高压液相色谱-串联质谱法对饮料中的蓖麻毒素进行定量检测。

Anal Bioanal Chem. 2021 Jan;413(2):585-597. doi: 10.1007/s00216-020-03030-8. Epub 2020 Nov 12.

Determination of bovine lactoferrin in dairy products by ultra-high performance liquid chromatography-tandem mass spectrometry based on tryptic signature peptides employing an isotope-labeled winged peptide as internal standard.基于胰蛋白酶特征肽并采用同位素标记的带翼肽作为内标，通过超高效液相色谱-串联质谱法测定乳制品中的牛乳铁蛋白。

Anal Chim Acta. 2014 Jun 4;829:33-9. doi: 10.1016/j.aca.2014.04.025. Epub 2014 Apr 24.

引用本文的文献

Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models.使用人工智能进行质谱肽特性预测：最新模型介绍

Proteomics. 2025 May;25(9-10):e202400398. doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.

Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification.重新评分肽谱匹配：通过将肽性质预测器集成到肽鉴定中提高蛋白质组学性能。

Mol Cell Proteomics. 2024 Jul;23(7):100798. doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11.

Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier.基于两阶段分类器的单分子肽测序数据的氨基酸序列赋值。

PLoS Comput Biol. 2023 May 30;19(5):e1011157. doi: 10.1371/journal.pcbi.1011157. eCollection 2023 May.

DeGlyPHER: Highly sensitive site-specific analysis of N-linked glycans on proteins.DeGlyPHER：蛋白质上 N-连接聚糖的高灵敏位点特异性分析。

Methods Enzymol. 2023;682:137-185. doi: 10.1016/bs.mie.2022.09.004. Epub 2022 Dec 26.

Toward an Integrated Machine Learning Model of a Proteomics Experiment.迈向蛋白质组学实验的集成机器学习模型。

J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

ProteomicsML: An Online Platform for Community-Curated Data sets and Tutorials for Machine Learning in Proteomics.蛋白质组学 ML：一个在线平台，用于社区策划的数据集和蛋白质组学机器学习教程。

J Proteome Res. 2023 Feb 3;22(2):632-636. doi: 10.1021/acs.jproteome.2c00629. Epub 2023 Jan 24.

Impact of Growth Rate on the Protein-mRNA Ratio in Pseudomonas aeruginosa.铜绿假单胞菌生长速度对其蛋白与 mRNA 比率的影响。

mBio. 2023 Feb 28;14(1):e0306722. doi: 10.1128/mbio.03067-22. Epub 2022 Dec 8.

本文引用的文献

Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines.NCI-60癌细胞系的定量蛋白质组图谱

iScience. 2019 Nov 22;21:664-680. doi: 10.1016/j.isci.2019.10.059. Epub 2019 Oct 31.

On the Dependency of Cellular Protein Levels on mRNA Abundance.细胞蛋白质水平对mRNA丰度的依赖性

Cell. 2016 Apr 21;165(3):535-50. doi: 10.1016/j.cell.2016.03.014.

Using Data Independent Acquisition (DIA) to Model High-responding Peptides for Targeted Proteomics Experiments.利用数据非依赖采集（DIA）为靶向蛋白质组学实验对高响应肽段进行建模。

Mol Cell Proteomics. 2015 Sep;14(9):2331-40. doi: 10.1074/mcp.M115.051300. Epub 2015 Jun 22.

Abundance-based classifier for the prediction of mass spectrometric peptide detectability upon enrichment (PPA).基于丰度的分类器用于预测富集后质谱肽可检测性（PPA）。

Mol Cell Proteomics. 2015 Feb;14(2):430-40. doi: 10.1074/mcp.M114.044321. Epub 2014 Dec 3.

CONSeQuence: prediction of reference peptides for absolute quantitative proteomics using consensus machine learning approaches.CONSeQuence：使用共识机器学习方法预测绝对定量蛋白质组学的参考肽。

Mol Cell Proteomics. 2011 Nov;10(11):M110.003384. doi: 10.1074/mcp.M110.003384. Epub 2011 Aug 3.

Capitalizing on the hydrophobic bias of electrospray ionization through chemical modification in mass spectrometry-based proteomics.通过基于质谱的蛋白质组学中的化学修饰利用电喷雾电离的疏水性偏置。

Expert Rev Proteomics. 2011 Jun;8(3):317-23. doi: 10.1586/epr.11.24.

Improving limits of detection for B-type natriuretic peptide using PC-IDMS: an application of the ALiPHAT strategy.利用 PC-IDMS 提高 B 型利钠肽检测限：ALiPHAT 策略的应用。

Analyst. 2010 Jan;135(1):36-41. doi: 10.1039/b919484c. Epub 2009 Nov 19.

Prediction of high-responding peptides for targeted protein assays by mass spectrometry.通过质谱法预测用于靶向蛋白质分析的高反应性肽段。

Nat Biotechnol. 2009 Feb;27(2):190-8. doi: 10.1038/nbt.1524. Epub 2009 Jan 25.

A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics.一种用于预测精确质量和时间蛋白质组学中蛋白型肽段的支持向量机模型。

Bioinformatics. 2008 Jul 1;24(13):1503-9. doi: 10.1093/bioinformatics/btn218. Epub 2008 May 3.

Prediction of peptides observable by mass spectrometry applied at the experimental set level.在实验装置水平上应用质谱法对可观测肽段的预测。

BMC Bioinformatics. 2007 Nov 1;8 Suppl 7(Suppl 7):S23. doi: 10.1186/1471-2105-8-S7-S23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验