Wong Jason W H, Sullivan Matthew J, Cartwright Hugh M, Cagney Gerard
Chemistry Department, Oxford University, Physical and Theoretical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, UK.
BMC Bioinformatics. 2007 Feb 9;8:51. doi: 10.1186/1471-2105-8-51.
In proteomics experiments, database-search programs are the method of choice for protein identification from tandem mass spectra. As amino acid sequence databases grow however, computing resources required for these programs have become prohibitive, particularly in searches for modified proteins. Recently, methods to limit the number of spectra to be searched based on spectral quality have been proposed by different research groups, but rankings of spectral quality have thus far been based on arbitrary cut-off values. In this work, we develop a more readily interpretable spectral quality statistic by providing probability values for the likelihood that spectra will be identifiable.
We describe an application, msmsEval, that builds on previous work by statistically modeling the spectral quality discriminant function using a Gaussian mixture model. This allows a researcher to filter spectra based on the probability that a spectrum will ultimately be identified by database searching. We show that spectra that are predicted by msmsEval to be of high quality, yet remain unidentified in standard database searches, are candidates for more intensive search strategies. Using a well studied public dataset we also show that a high proportion (83.9%) of the spectra predicted by msmsEval to be of high quality but that elude standard search strategies, are in fact interpretable.
msmsEval will be useful for high-throughput proteomics projects and is freely available for download from http://proteomics.ucd.ie/msmseval. Supports Windows, Mac OS X and Linux/Unix operating systems.
在蛋白质组学实验中,数据库搜索程序是从串联质谱中鉴定蛋白质的首选方法。然而,随着氨基酸序列数据库的不断增长,这些程序所需的计算资源变得令人望而却步,尤其是在搜索修饰蛋白质时。最近,不同的研究小组提出了基于谱图质量来限制待搜索谱图数量的方法,但迄今为止,谱图质量的排名是基于任意的截止值。在这项工作中,我们通过为谱图可识别的可能性提供概率值,开发了一种更易于解释的谱图质量统计方法。
我们描述了一个应用程序msmsEval,它基于之前的工作,使用高斯混合模型对谱图质量判别函数进行统计建模。这使得研究人员能够根据谱图最终通过数据库搜索被识别的概率来过滤谱图。我们表明,msmsEval预测为高质量但在标准数据库搜索中仍未被识别的谱图,是更密集搜索策略的候选对象。使用一个经过充分研究的公共数据集,我们还表明,msmsEval预测为高质量但无法通过标准搜索策略识别的谱图中,有很大比例(83.9%)实际上是可以解释的。
msmsEval将对高通量蛋白质组学项目有用,可从http://proteomics.ucd.ie/msmseval免费下载。支持Windows、Mac OS X和Linux/Unix操作系统。