Suppr超能文献

肽质量指纹图谱峰强度预测:从光谱中提取知识。

Peptide mass fingerprinting peak intensity prediction: extracting knowledge from spectra.

作者信息

Gay Steven, Binz Pierre-Alain, Hochstrasser Denis F, Appel Ron D

机构信息

Swiss Institute of Bioinformatics, Geneva, Switzerland.

出版信息

Proteomics. 2002 Oct;2(10):1374-91. doi: 10.1002/1615-9861(200210)2:10<1374::AID-PROT1374>3.0.CO;2-D.

Abstract

Matrix-assisted laser desorption/ionization-time of flight mass spectrometry has become a valuable tool in proteomics. With the increasing acquisition rate of mass spectrometers, one of the major issues is the development of accurate, efficient and automatic peptide mass fingerprinting (PMF) identification tools. Current tools are mostly based on counting the number of experimental peptide masses matching with theoretical masses. Almost all of them use additional criteria such as isoelectric point, molecular weight, PTMs, taxonomy or enzymatic cleavage rules to enhance prediction performance. However, these identification tools seldom use peak intensities as parameter as there is currently no model predicting the intensities based on the physicochemical properties of peptides. In this work, we used standard datamining methods such as classification and regression methods to find correlations between peak intensities and the properties of the peptides composing a PMF spectrum. These methods were applied on a dataset comprising a series of PMF experiments involving 157 proteins. We found that the C4.5 method gave the more informative results for the classification task (prediction of the presence or absence of a peptide in a spectra) and M5' for the regression methods (prediction of the normalized intensity of a peptide peak). The C4.5 result correctly classified 88% of the theoretical peaks; whereas the M5' peak intensities had a correlation coefficient of 0.6743 with the experimental peak intensities. These methods enabled us to obtain decision and model trees that can be directly used for prediction and identification of PMF results. The work performed permitted to lay the foundations of a method to analyze factors influencing the peak intensity of PMF spectra. A simple extension of this analysis could lead to improve the accuracy of the results by using a larger dataset. Additional peptide characteristics or even PMF experimental parameters can also be taken into account in the datamining process to analyze their influence on the peak intensity. Furthermore, this datamining approach can certainly be extended to the tandem mass spectrometry domain or other mass spectrometry derived methods.

摘要

基质辅助激光解吸/电离飞行时间质谱已成为蛋白质组学中的一种重要工具。随着质谱仪采集速率的提高,主要问题之一是开发准确、高效且自动的肽质量指纹图谱(PMF)鉴定工具。当前的工具大多基于计算与理论质量匹配的实验肽质量的数量。几乎所有工具都使用诸如等电点、分子量、翻译后修饰、分类学或酶切规则等附加标准来提高预测性能。然而,这些鉴定工具很少将峰强度用作参数,因为目前尚无基于肽的物理化学性质预测强度的模型。在这项工作中,我们使用了分类和回归方法等标准数据挖掘方法来寻找峰强度与构成PMF谱的肽的性质之间的相关性。这些方法应用于一个包含一系列涉及157种蛋白质的PMF实验的数据集。我们发现,C4.5方法在分类任务(预测谱中肽的存在与否)中给出了更具信息性的结果,而M5'在回归方法(预测肽峰的归一化强度)中表现更佳。C4.5的结果正确分类了88%的理论峰;而M5'的峰强度与实验峰强度的相关系数为0.6743。这些方法使我们能够获得可直接用于预测和鉴定PMF结果的决策树和模型树。所开展的工作为分析影响PMF谱峰强度的因素奠定了方法基础。通过使用更大的数据集,对该分析进行简单扩展可能会提高结果的准确性。在数据挖掘过程中还可以考虑其他肽特征甚至PMF实验参数,以分析它们对峰强度的影响。此外,这种数据挖掘方法肯定可以扩展到串联质谱领域或其他质谱衍生方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验