Ritz Anna, Shakhnarovich Gregory, Salomon Arthur R, Raphael Benjamin J
Department of Computer Science, Brown University, Toyota Technological Institute at Chicago, Chicago, IL, USA.
Bioinformatics. 2009 Jan 1;25(1):14-21. doi: 10.1093/bioinformatics/btn569. Epub 2008 Nov 7.
Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence.
We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments.Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases.
通过磷酸化修饰蛋白质是细胞信号转导的主要机制。蛋白质上的磷酸化位点部分是通过氨基酸序列中存在的特定模式或基序来确定的。
我们描述了一种算法,该算法能在一组由几种不同激酶磷酸化的肽中同时发现多个基序。这样的肽组在蛋白质组学实验中经常产生。我们的基序发现算法使用最小描述长度原理来确定区分磷酸化肽前景集与未磷酸化肽背景集的序列基序混合物。我们表明,在由已知磷酸化位点混合物组成的合成数据集上,我们的算法优于现有的基序发现算法。我们还推导了一个基序特异性分数,用于量化包含基序实例的磷酸化蛋白质是否具有大量已知相互作用。将我们的基序发现算法应用于最近发表的人类和小鼠蛋白质组学研究,发现了几个已知的磷酸化基序,并揭示了许多与特定激酶或磷酸酶相互作用丰富的新基序。我们的工具为揭示未表征激酶或磷酸酶的序列特异性提供了一种新方法。