Division of Biological Chemistry and Drug Discovery, College of LifeSciences, University of Dundee, Dow Street, Dundee DD1 5EH, UK.
Bioinformatics. 2010 Sep 1;26(17):2153-9. doi: 10.1093/bioinformatics/btq341. Epub 2010 Jul 22.
Complex patterns of protein phosphorylation mediate many cellular processes. Tandem mass spectrometry (MS/MS) is a powerful tool for identifying these post-translational modifications. In high-throughput experiments, mass spectrometry database search engines, such as MASCOT provide a ranked list of peptide identifications based on hundreds of thousands of MS/MS spectra obtained in a mass spectrometry experiment. These search results are not in themselves sufficient for confident assignment of phosphorylation sites as identification of characteristic mass differences requires time-consuming manual assessment of the spectra by an experienced analyst. The time required for manual assessment has previously rendered high-throughput confident assignment of phosphorylation sites challenging.
We have developed a knowledge base of criteria, which replicate expert assessment, allowing more than half of cases to be automatically validated and site assignments verified with a high degree of confidence. This was assessed by comparing automated spectral interpretation with careful manual examination of the assignments for 501 peptides above the 1% false discovery rate (FDR) threshold corresponding to 259 putative phosphorylation sites in 74 proteins of the Trypanosoma brucei proteome. Despite this stringent approach, we are able to validate 80 of the 91 phosphorylation sites (88%) positively identified by manual examination of the spectra used for the MASCOT searches with a FDR < 15%.
High-throughput computational analysis can provide a viable second stage validation of primary mass spectrometry database search results. Such validation gives rapid access to a systems level overview of protein phosphorylation in the experiment under investigation.
A GPL licensed software implementation in Perl for analysis and spectrum annotation is available in the supplementary material and a web server can be assessed online at http://www.compbio.dundee.ac.uk/prophossi.
蛋白质磷酸化的复杂模式介导了许多细胞过程。串联质谱(MS/MS)是鉴定这些翻译后修饰的强大工具。在高通量实验中,质谱数据库搜索引擎(如 Mascot)根据在质谱实验中获得的数十万 MS/MS 谱提供了肽鉴定的排序列表。这些搜索结果本身不足以确定磷酸化位点的置信度,因为特征质量差异的识别需要经验丰富的分析师对光谱进行耗时的手动评估。以前,手动评估所需的时间使得高通量的磷酸化位点置信度分配具有挑战性。
我们开发了一个知识库,其中包含可复制专家评估的标准,这使得超过一半的情况可以自动验证,并且可以高度置信地验证站点分配。这是通过将自动光谱解释与对超过 1%假发现率(FDR)阈值的 501 个肽的仔细手动检查进行比较来评估的,该阈值对应于 74 个蛋白中的 259 个假定磷酸化位点。布鲁氏锥虫蛋白质组。尽管采用了这种严格的方法,但我们能够验证通过手动检查用于 Mascot 搜索的光谱而正向鉴定的 91 个磷酸化位点中的 80 个(88%),FDR<15%。
高通量计算分析可以为初级质谱数据库搜索结果提供可行的第二阶段验证。这种验证可以快速获得正在研究的实验中蛋白质磷酸化的系统级概述。
在 Perl 中提供了一个 GPL 许可的软件实现,用于分析和光谱注释,可在补充材料中获得,并且可以在在线评估 web 服务器 http://www.compbio.dundee.ac.uk/prophossi。