Liu Chunmei, Yan Bo, Song Yinglei, Xu Ying, Cai Liming
Department of Computer Science, University of Georgia, Athens, GA 30602, USA.
Bioinformatics. 2006 Jul 15;22(14):e307-13. doi: 10.1093/bioinformatics/btl226.
An important but difficult problem in proteomics is the identification of post-translational modifications (PTMs) in a protein. In general, the process of PTM identification by aligning experimental spectra with theoretical spectra from peptides in a peptide database is very time consuming and may lead to high false positive rate. In this paper, we introduce a new approach that is both efficient and effective for blind PTM identification. Our work consists of the following phases. First, we develop a novel tree decomposition based algorithm that can efficiently generate peptide sequence tags (PSTs) from an extended spectrum graph. Sequence tags are selected from all maximum weighted antisymmetric paths in the graph and their reliabilities are evaluated with a score function. An efficient deterministic finite automaton (DFA) based model is then developed to search a peptide database for candidate peptides by using the generated sequence tags. Finally, a point process model-an efficient blind search approach for PTM identification, is applied to report the correct peptide and PTMs if there are any. Our tests on 2657 experimental tandem mass spectra and 2620 experimental spectra with one artificially added PTM show that, in addition to high efficiency, our ab-initio sequence tag selection algorithm achieves better or comparable accuracy to other approaches. Database search results show that the sequence tags of lengths 3 and 4 filter out more than 98.3% and 99.8% peptides respectively when applied to a yeast peptide database. With the dramatically reduced search space, the point process model achieves significant improvement in accuracy as well.
The software is available upon request.
蛋白质组学中的一个重要但困难的问题是蛋白质中翻译后修饰(PTM)的识别。一般来说,通过将实验光谱与肽数据库中肽的理论光谱进行比对来识别PTM的过程非常耗时,并且可能导致高假阳性率。在本文中,我们介绍了一种对盲法PTM识别既高效又有效的新方法。我们的工作包括以下几个阶段。首先,我们开发了一种基于新颖树分解的算法,该算法可以从扩展光谱图中高效生成肽序列标签(PST)。从图中的所有最大加权反对称路径中选择序列标签,并使用评分函数评估其可靠性。然后开发了一种基于高效确定性有限自动机(DFA)的模型,通过使用生成的序列标签在肽数据库中搜索候选肽。最后,应用点过程模型——一种用于PTM识别的高效盲搜索方法,如果存在正确的肽和PTM,则报告它们。我们对2657个实验串联质谱和2620个添加了一个人工PTM的实验光谱进行的测试表明,除了高效之外,我们的从头序列标签选择算法在准确性上与其他方法相当或更优。数据库搜索结果表明,长度为3和4的序列标签应用于酵母肽数据库时,分别滤除了超过98.3%和99.8%的肽。随着搜索空间的大幅减少,点过程模型在准确性上也有显著提高。
可根据要求提供该软件。