Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO 63110, USA.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):524-36. doi: 10.1109/TCBB.2008.101.
Motifs are overrepresented sequence or spatial patterns appearing in proteins. They often play important roles in maintaining protein stability and in facilitating protein function. When motifs are located in short sequence fragments, as in transmembrane domains that are only 6-20 residues in length, and when there is only very limited data, it is difficult to identify motifs. In this study, we introduce combinatorial models based on permutation for assessing statistically significant sequence and spatial patterns in short sequences. We show that our method can uncover previously unknown sequence and spatial motifs in beta-barrel membrane proteins and that our method outperforms existing methods in detecting statistically significant motifs in this data set. Last, we discuss implications of motif analysis for problems involving short sequences in other families of proteins.
模体是在蛋白质中出现的过度表达的序列或空间模式。它们通常在维持蛋白质稳定性和促进蛋白质功能方面发挥重要作用。当模体位于短序列片段中时,例如长度仅为 6-20 个残基的跨膜结构域,并且只有非常有限的数据时,识别模体就变得很困难。在这项研究中,我们引入了基于排列的组合模型,以评估短序列中的统计显著序列和空间模式。我们表明,我们的方法可以揭示β桶膜蛋白中以前未知的序列和空间模体,并且我们的方法在检测该数据集中统计显著模体方面优于现有方法。最后,我们讨论了 motif 分析对涉及其他蛋白质家族的短序列问题的影响。