匹配来自代谢标记样本的同位素分布。

Matching isotopic distributions from metabolically labeled samples.

作者信息

McIlwain Sean, Page David, Huttlin Edward L, Sussman Michael R

机构信息

Department of Computer Sciences, University of Wisconsin, Madison, WI, USA.

出版信息

Bioinformatics. 2008 Jul 1;24(13):i339-47. doi: 10.1093/bioinformatics/btn190.

DOI:10.1093/bioinformatics/btn190

PMID:18586733

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2718665/

Abstract

MOTIVATION

In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry.

RESULTS

The primary result of this article is that the dynamic programming approach performs well given perfect isotopic distribution annotations. Our algorithm achieves a true positive rate of 99% and a false positive rate of 1% using perfect isotopic distribution annotations. When the isotopic distributions are annotated given 'expert' selected peaks, the same algorithm gets a true positive rate of 77% and a false positive rate of 1%. Finally, when annotating using machine selected peaks, which may contain noise, the dynamic programming algorithm gives a true positive rate of 36% and a false positive rate of 1%. It is important to mention that these rates arise from the requirement of exact annotations of both the light and heavy isotopic distributions. In our evaluations, a match is considered 'entirely incorrect' if it is missing even one peak or containing an extraneous peak. If we only require that the 'monoisotopic' peaks exist within the two matched distributions, our algorithm obtains a positive rate of 45% and a false positive rate of 1% on the 'machine' selected data. Changes to the algorithm's scoring function and training example generation improves our 'monoisotopic' peak score true positive rate to 65% while obtaining a false positive rate of 2%. All results were obtained within 10-fold cross-validation of 41 mass spectra with a mass-to-charge range of 800-4000 m/z. There are a total of 713 isotopic distributions and 255 matched isotopic pairs that are hand-annotated for this study.

AVAILABILITY

Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM/.

摘要

动机

近年来，稳定同位素标记已成为定量蛋白质组分析的标准方法。在众多可用的同位素标记策略中，代谢标记因其提供的出色内部对照而颇具吸引力。然而，代谢标记实验数据的分析可能会很复杂，因为每种肽的标记形式和未标记形式之间的间距取决于其序列，因此不同分析物之间存在差异。因此，通常需要知道肽的序列才能以自动化方式识别其匹配的同位素分布。在某些实验情况下，有必要或希望匹配来自未知序列肽的标记峰和未标记峰对。本文通过提出一种算法来解决定量质谱数据分析中这个很大程度上被忽视的问题，该算法不仅能识别质谱图中的同位素分布，还能注释天然丰度轻同位素分布与其代谢标记对应物之间的匹配。此算法分两个阶段设计：首先，我们使用去年描述的IDM算法的修改版本注释同位素峰；然后，我们使用概率分类器，并辅以动态规划来找到代谢标记的匹配同位素对。对于通过质谱测量的高通量定量蛋白质组代谢组实验，需要这样一种方法。

结果

本文的主要结果是，在同位素分布注释完美的情况下，动态规划方法表现良好。使用完美的同位素分布注释，我们的算法实现了99%的真阳性率和1%的假阳性率。当根据“专家”选择的峰注释同位素分布时，相同算法的真阳性率为77%，假阳性率为1%。最后，当使用可能包含噪声的机器选择的峰进行注释时，动态规划算法的真阳性率为36%，假阳性率为1%。需要指出的是，这些比率源于对轻、重同位素分布精确注释的要求。在我们的评估中，如果一个匹配缺少哪怕一个峰或包含一个额外的峰，就会被认为“完全错误”。如果我们只要求两个匹配分布中存在“单同位素”峰，我们的算法在“机器”选择的数据上获得了45%的阳性率和1%的假阳性率。对算法评分函数和训练示例生成的更改将我们的“单同位素”峰得分真阳性率提高到65%，同时假阳性率为2%。所有结果均在41个质谱图的10倍交叉验证中获得，质荷比范围为800 - 4000 m/z。本研究共手动注释了713个同位素分布和255个匹配的同位素对。

可用性

程序可通过http://www.cs.wisc.edu/~mcilwain/IDM/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1ea/2718665/e841d94cd561/btn190f1.jpg

相似文献

Matching isotopic distributions from metabolically labeled samples.

Bioinformatics. 2008 Jul 1;24(13):i339-47. doi: 10.1093/bioinformatics/btn190.

Using dynamic programming to create isotopic distribution maps from mass spectra.

Bioinformatics. 2007 Jul 1;23(13):i328-36. doi: 10.1093/bioinformatics/btm198.

An automated method for the analysis of stable isotope labeling data in proteomics.

J Am Soc Mass Spectrom. 2005 Jul;16(7):1181-91. doi: 10.1016/j.jasms.2005.03.016.

Minimizing the overlap problem in protein NMR: a computational framework for precision amino acid labeling.

Bioinformatics. 2007 Nov 1;23(21):2829-35. doi: 10.1093/bioinformatics/btm406. Epub 2007 Sep 25.

Data reduction of isotope-resolved LC-MS spectra.

Bioinformatics. 2007 Jun 1;23(11):1394-400. doi: 10.1093/bioinformatics/btm083. Epub 2007 May 11.

A suffix tree approach to the interpretation of tandem mass spectra: applications to peptides of non-specific digestion and post-translational modifications.

Bioinformatics. 2003 Oct;19 Suppl 2:ii113-21. doi: 10.1093/bioinformatics/btg1068.

Quantification of peptide m/z distributions from 13C-labeled cultures with high-resolution mass spectrometry.

Anal Chem. 2014 Feb 4;86(3):1894-901. doi: 10.1021/ac403985w. Epub 2014 Jan 21.

Rapid validation of Mascot search results via stable isotope labeling, pair picking, and deconvolution of fragmentation patterns.

Mol Cell Proteomics. 2009 Aug;8(8):2011-22. doi: 10.1074/mcp.M800472-MCP200. Epub 2009 May 11.

Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies.

Bioinformatics. 2007 Sep 1;23(17):2210-7. doi: 10.1093/bioinformatics/btm267. Epub 2007 May 17.

A model-based method for the prediction of the isotopic distribution of peptides.

J Am Soc Mass Spectrom. 2008 May;19(5):703-12. doi: 10.1016/j.jasms.2008.01.009. Epub 2008 Jan 31.

引用本文的文献

Leveraging proteomics to understand plant-microbe interactions.

Front Plant Sci. 2012 Mar 8;3:44. doi: 10.3389/fpls.2012.00044. eCollection 2012.

Prion disease diagnosis by proteomic profiling.

J Proteome Res. 2009 Feb;8(2):1030-6. doi: 10.1021/pr800832s.

本文引用的文献

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions.

J Am Soc Mass Spectrom. 1995 Apr;6(4):229-33. doi: 10.1016/1044-0305(95)00017-8.

Stable isotope assisted assignment of elemental compositions for metabolomics.

Anal Chem. 2007 Sep 15;79(18):6912-21. doi: 10.1021/ac070346t. Epub 2007 Aug 21.

Using dynamic programming to create isotopic distribution maps from mass spectra.

Bioinformatics. 2007 Jul 1;23(13):i328-36. doi: 10.1093/bioinformatics/btm198.

ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry.

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W701-6. doi: 10.1093/nar/gkm371. Epub 2007 Jun 22.

Implications of 15N-metabolic labeling for automated peptide identification in Arabidopsis thaliana.

Proteomics. 2007 Apr;7(8):1279-92. doi: 10.1002/pmic.200600832.

Comparison of full versus partial metabolic labeling for quantitative proteomics analysis in Arabidopsis thaliana.

Mol Cell Proteomics. 2007 May;6(5):860-81. doi: 10.1074/mcp.M600347-MCP200. Epub 2007 Feb 9.

Mass spectrometry and protein analysis.

Science. 2006 Apr 14;312(5771):212-7. doi: 10.1126/science.1124619.

Efficient calculation of accurate masses of isotopic peaks.

J Am Soc Mass Spectrom. 2006 Mar;17(3):415-9. doi: 10.1016/j.jasms.2005.12.001. Epub 2006 Feb 3.

Assessing the effects of diurnal variation on the composition of human parotid saliva: quantitative analysis of native peptides using iTRAQ reagents.

Anal Chem. 2005 Aug 1;77(15):4947-54. doi: 10.1021/ac050161r.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

匹配来自代谢标记样本的同位素分布。

Matching isotopic distributions from metabolically labeled samples.

作者信息

McIlwain Sean, Page David, Huttlin Edward L, Sussman Michael R

机构信息

Department of Computer Sciences, University of Wisconsin, Madison, WI, USA.