Davey Norman E, Shields Denis C, Edwards Richard J
Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.
Nucleic Acids Res. 2006 Jul 19;34(12):3546-54. doi: 10.1093/nar/gkl486. Print 2006.
Many important interactions of proteins are facilitated by short, linear motifs (SLiMs) within a protein's primary sequence. Our aim was to establish robust methods for discovering putative functional motifs. The strongest evidence for such motifs is obtained when the same motifs occur in unrelated proteins, evolving by convergence. In practise, searches for such motifs are often swamped by motifs shared in related proteins that are identical by descent. Prediction of motifs among sets of biologically related proteins, including those both with and without detectable similarity, were made using the TEIRESIAS algorithm. The number of motif occurrences arising through common evolutionary descent were normalized based on treatment of BLAST local alignments. Motifs were ranked according to a score derived from the product of the normalized number of occurrences and the information content. The method was shown to significantly outperform methods that do not discount evolutionary relatedness, when applied to known SLiMs from a subset of the eukaryotic linear motif (ELM) database. An implementation of Multiple Spanning Tree weighting outperformed two other weighting schemes, in a variety of settings.
蛋白质的许多重要相互作用是由蛋白质一级序列中的短线性基序(SLiMs)促成的。我们的目标是建立可靠的方法来发现推定的功能基序。当相同的基序出现在不相关的蛋白质中并通过趋同进化时,就能获得此类基序的最有力证据。实际上,寻找此类基序的搜索通常会被通过共同祖先遗传而相同的相关蛋白质中共享的基序所淹没。使用TEIRESIAS算法对包括具有和不具有可检测相似性的生物相关蛋白质组中的基序进行预测。基于对BLAST局部比对的处理,对通过共同进化遗传产生的基序出现次数进行归一化。基序根据从归一化出现次数与信息含量的乘积得出的分数进行排序。当应用于真核线性基序(ELM)数据库子集中的已知SLiMs时,该方法被证明明显优于不考虑进化相关性的方法。在各种设置下,多重生成树加权的一种实现方式优于其他两种加权方案。