基质辅助激光解吸电离飞行时间质谱中的峰强度预测：一项支持定量蛋白质组学的机器学习研究。

Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

作者信息

Timm Wiebke, Scherbart Alexandra, Böcker Sebastian, Kohlbacher Oliver, Nattkemper Tim W

机构信息

Applied Neuroinformatics Group, Bielefeld University, Germany.

出版信息

BMC Bioinformatics. 2008 Oct 20;9:443. doi: 10.1186/1471-2105-9-443.

DOI:10.1186/1471-2105-9-443

PMID:18937839

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2600826/

Abstract

BACKGROUND

Mass spectrometry is a key technique in proteomics and can be used to analyze complex samples quickly. One key problem with the mass spectrometric analysis of peptides and proteins, however, is the fact that absolute quantification is severely hampered by the unclear relationship between the observed peak intensity and the peptide concentration in the sample. While there are numerous approaches to circumvent this problem experimentally (e.g. labeling techniques), reliable prediction of the peak intensities from peptide sequences could provide a peptide-specific correction factor. Thus, it would be a valuable tool towards label-free absolute quantification.

RESULTS

In this work we present machine learning techniques for peak intensity prediction for MALDI mass spectra. Features encoding the peptides' physico-chemical properties as well as string-based features were extracted. A feature subset was obtained from multiple forward feature selections on the extracted features. Based on these features, two advanced machine learning methods (support vector regression and local linear maps) are shown to yield good results for this problem (Pearson correlation of 0.68 in a ten-fold cross validation).

CONCLUSION

The techniques presented here are a useful first step going beyond the binary prediction of proteotypic peptides towards a more quantitative prediction of peak intensities. These predictions in turn will turn out to be beneficial for mass spectrometry-based quantitative proteomics.

摘要

背景

质谱分析是蛋白质组学中的一项关键技术，可用于快速分析复杂样本。然而，肽和蛋白质的质谱分析存在一个关键问题，即由于样本中观察到的峰强度与肽浓度之间的关系不明确，绝对定量受到严重阻碍。虽然有许多实验方法可以规避这个问题（例如标记技术），但从肽序列可靠预测峰强度可以提供一个肽特异性校正因子。因此，它将成为无标记绝对定量的一个有价值的工具。

结果

在这项工作中，我们展示了用于预测基质辅助激光解吸电离质谱峰强度的机器学习技术。提取了编码肽的物理化学性质的特征以及基于字符串的特征。通过对提取的特征进行多次前向特征选择获得了一个特征子集。基于这些特征，两种先进的机器学习方法（支持向量回归和局部线性映射）在这个问题上取得了良好的结果（十折交叉验证中的皮尔逊相关系数为0.68）。

结论

本文提出的技术是超越对蛋白型肽的二元预测，迈向对峰强度进行更定量预测的有用的第一步。这些预测反过来将被证明对基于质谱的定量蛋白质组学有益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ebf/2600826/4a1c9f794bf3/1471-2105-9-443-1.jpg

相似文献

Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

BMC Bioinformatics. 2008 Oct 20;9:443. doi: 10.1186/1471-2105-9-443.

Improved reporter ion assignment of raw isobaric stable isotope labeled liquid chromatography/matrix-assisted laser desorption/ionization tandem time-of-flight mass spectral data for quantitative proteomics.

Rapid Commun Mass Spectrom. 2012 Dec 15;26(23):2777-85. doi: 10.1002/rcm.6403.

MSQ: a tool for quantification of proteomics data generated by a liquid chromatography/matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometry based targeted quantitative proteomics platform.

Rapid Commun Mass Spectrom. 2010 Feb;24(4):403-8. doi: 10.1002/rcm.4407.

Feature selection and nearest centroid classification for protein mass spectrometry.

BMC Bioinformatics. 2005 Mar 23;6:68. doi: 10.1186/1471-2105-6-68.

A simple method for quantification of peptides and proteins by matrix-assisted laser desorption ionization mass spectrometry.

Anal Chem. 2012 Dec 4;84(23):10332-7. doi: 10.1021/ac302807u. Epub 2012 Nov 12.

Application of targeted quantitative proteomics analysis in human cerebrospinal fluid using a liquid chromatography matrix-assisted laser desorption/ionization time-of-flight tandem mass spectrometer (LC MALDI TOF/TOF) platform.

J Proteome Res. 2008 Feb;7(2):720-30. doi: 10.1021/pr700630x. Epub 2008 Jan 11.

Peptide mass fingerprinting peak intensity prediction: extracting knowledge from spectra.

Proteomics. 2002 Oct;2(10):1374-91. doi: 10.1002/1615-9861(200210)2:10<1374::AID-PROT1374>3.0.CO;2-D.

Feature selection and machine learning with mass spectrometry data.

Methods Mol Biol. 2013;1007:237-62. doi: 10.1007/978-1-62703-392-3_10.

Improved classification of mass spectrometry database search results using newer machine learning approaches.

Mol Cell Proteomics. 2006 Mar;5(3):497-509. doi: 10.1074/mcp.M500233-MCP200. Epub 2005 Nov 30.

Visualization procedures for proteins and peptides on flat-bed monoliths and their effects on matrix-assisted laser-desorption/ionization time-of-flight mass spectrometric detection.

J Chromatogr A. 2013 Apr 19;1286:222-8. doi: 10.1016/j.chroma.2013.02.064. Epub 2013 Feb 27.

引用本文的文献

Advancing the enzymatic toolkit for 2'-fluoro arabino nucleic acid (FANA) manipulation: phosphorylation, ligation, replication, and templating RNA transcription.

Chem Sci. 2024 Jun 24;15(31):12534-12542. doi: 10.1039/d4sc02904f. eCollection 2024 Aug 7.

Supervised topological data analysis for MALDI mass spectrometry imaging applications.

BMC Bioinformatics. 2023 Jul 10;24(1):279. doi: 10.1186/s12859-023-05402-0.

Rapid and Reproducible MALDI-TOF-Based Method for the Detection of Vancomycin-Resistant Using Classifying Algorithms.

Diagnostics (Basel). 2022 Jan 27;12(2):328. doi: 10.3390/diagnostics12020328.

Mass spectrometry and machine learning for the accurate diagnosis of benzylpenicillin and multidrug resistance of Staphylococcus aureus in bovine mastitis.

PLoS Comput Biol. 2021 Jun 11;17(6):e1009108. doi: 10.1371/journal.pcbi.1009108. eCollection 2021 Jun.

Modeling and systematic analysis of biomarker validation using selected reaction monitoring.

EURASIP J Bioinform Syst Biol. 2014 Nov 15;2014:17. doi: 10.1186/s13637-014-0017-y. eCollection 2014 Dec.

Review of software tools for design and analysis of large scale MRM proteomic datasets.

Methods. 2013 Jun 15;61(3):287-98. doi: 10.1016/j.ymeth.2013.05.004. Epub 2013 May 21.

Tools for label-free peptide quantification.

Mol Cell Proteomics. 2013 Mar;12(3):549-56. doi: 10.1074/mcp.R112.025163. Epub 2012 Dec 17.

A systematic model of the LC-MS proteomics pipeline.

BMC Genomics. 2012;13 Suppl 6(Suppl 6):S2. doi: 10.1186/1471-2164-13-S6-S2. Epub 2012 Oct 26.

Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting.

Mol Cell Proteomics. 2011 Dec;10(12):M110.005785. doi: 10.1074/mcp.M110.005785. Epub 2011 Jul 20.

Advances in structure elucidation of small molecules using mass spectrometry.

Bioanal Rev. 2010 Dec;2(1-4):23-60. doi: 10.1007/s12566-010-0015-9. Epub 2010 Aug 21.

本文引用的文献

Quantitation of SR 27417 in human plasma using electrospray liquid chromatography-tandem mass spectrometry: A study of ion suppression.

J Am Soc Mass Spectrom. 1996 Nov;7(11):1099-105. doi: 10.1016/S1044-0305(96)00072-4.

Label-free detection of differential protein expression by LC/MALDI mass spectrometry.

J Proteome Res. 2008 Jun;7(6):2270-9. doi: 10.1021/pr700705u. Epub 2008 Apr 16.

Comparative LC-MS: a landscape of peaks and valleys.

Proteomics. 2008 Feb;8(4):731-49. doi: 10.1002/pmic.200700694.

Quantitative mass spectrometry in proteomics: a critical review.

Anal Bioanal Chem. 2007 Oct;389(4):1017-31. doi: 10.1007/s00216-007-1486-6. Epub 2007 Aug 1.

Computational prediction of proteotypic peptides for quantitative proteomics.

Nat Biotechnol. 2007 Jan;25(1):125-31. doi: 10.1038/nbt1275. Epub 2006 Dec 31.

Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation.

Nat Biotechnol. 2007 Jan;25(1):117-24. doi: 10.1038/nbt1270. Epub 2006 Dec 24.

A computational approach toward label-free protein quantification using predicted peptide detectability.

Bioinformatics. 2006 Jul 15;22(14):e481-8. doi: 10.1093/bioinformatics/btl237.

Absolute myoglobin quantitation in serum by combining two-dimensional liquid chromatography-electrospray ionization mass spectrometry and novel data analysis algorithms.

J Proteome Res. 2006 Feb;5(2):414-21. doi: 10.1021/pr050344u.

The cytosolic, cell surface and extracellular proteomes of the biotechnologically important soil bacterium Corynebacterium efficiens YS-314 in comparison to those of Corynebacterium glutamicum ATCC 13032.

Proteomics. 2006 Jan;6(1):233-50. doi: 10.1002/pmic.200500144.

Quantitative proteome analysis using differential stable isotopic labeling and microbore LC-MALDI MS and MS/MS.

J Proteome Res. 2005 May-Jun;4(3):734-42. doi: 10.1021/pr049784w.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基质辅助激光解吸电离飞行时间质谱中的峰强度预测：一项支持定量蛋白质组学的机器学习研究。

Peak intensity prediction in MALDI-TOF mass spectrometry: a machine learning study to support quantitative proteomics.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献