Complex and Adaptive Systems Laboratory, University College Dublin, Ireland.
BMC Bioinformatics. 2012 May 18;13:104. doi: 10.1186/1471-2105-13-104.
Short linear protein motifs are attracting increasing attention as functionally independent sites, typically 3-10 amino acids in length that are enriched in disordered regions of proteins. Multiple methods have recently been proposed to discover over-represented motifs within a set of proteins based on simple regular expressions. Here, we extend these approaches to profile-based methods, which provide a richer motif representation.
The profile motif discovery method MEME performed relatively poorly for motifs in disordered regions of proteins. However, when we applied evolutionary weighting to account for redundancy amongst homologous proteins, and masked out poorly conserved regions of disordered proteins, the performance of MEME is equivalent to that of regular expression methods. However, the two approaches returned different subsets within both a benchmark dataset, and a more realistic discovery dataset.
Profile-based motif discovery methods complement regular expression based methods. Whilst profile-based methods are computationally more intensive, they are likely to discover motifs currently overlooked by regular expression methods.
短线性蛋白基序作为功能独立的位点越来越受到关注,通常长度为 3-10 个氨基酸,富含蛋白质的无序区域。最近提出了多种方法来基于简单正则表达式在一组蛋白质中发现过度表达的基序。在这里,我们将这些方法扩展到基于轮廓的方法,这些方法提供了更丰富的基序表示。
MEME 轮廓基序发现方法在蛋白质无序区域的基序方面表现相对较差。然而,当我们应用进化加权来解释同源蛋白质之间的冗余,并掩盖无序蛋白质中保守性差的区域时,MEME 的性能与正则表达式方法相当。然而,这两种方法在基准数据集和更现实的发现数据集中都返回了不同的子集。
基于轮廓的基序发现方法补充了基于正则表达式的方法。虽然基于轮廓的方法计算上更密集,但它们很可能会发现当前被正则表达式方法忽略的基序。