Silvestre Dario Di, Zoppis Italo, Brambilla Francesca, Bellettato Valeria, Mauri Giancarlo, Mauri Pierluigi
, Institute for Biomedical Technologies (ITB-CNR), via F.lli Cervi 93, Segrate (Milan), Italy.
Department of Informatics, Systems and Communication, Viale Sarca 336, University of Milano-Bicocca, Milan, Italy.
J Clin Bioinforma. 2013 Jan 14;3(1):1. doi: 10.1186/2043-9113-3-1.
Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.
Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.
These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.
质谱分析法是临床蛋白质组学的一项重要分析工具。它主要用于生物标志物的发现,并且越来越多地被用于开发有助于对生物样本进行明确诊断的方法。在此背景下,我们通过将支持向量机(SVM)应用于通过多维蛋白质鉴定技术(MudPIT)方法获得的实验数据,来研究表型的分类。特别是,我们通过使用两个独立的复杂样本集合以及不同的数据类型(如质谱(m/z)、肽和蛋白质)来比较支持向量机的性能。
总体而言,蛋白质和肽数据比实验质谱具有更好的判别信息含量(在集合1和集合2中总体准确率均高于87%)。这些结果表明,肽和蛋白质的测序减少了影响原始质谱的实验噪声,并允许提取更多可用于有效分类样本的信息特征。此外,支持向量机选择的蛋白质和肽特征与MAProMa软件鉴定的差异表达蛋白质有80%的匹配度。
这些发现证实了基于光谱计数处理和基于SEQUEST的得分值的最无标记定量方法的可用性。另一方面,它强调了通过应用监督和无监督学习算法,MudPIT数据对于正确分组样本表型的有用性。这种能力允许对实际样本进行评估,并且是将蛋白质组学方法转化为临床应用的一个良好起点。