Pirovano Walter, Feenstra K Anton, Heringa Jaap
Centre for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands.
Nucleic Acids Res. 2006;34(22):6540-8. doi: 10.1093/nar/gkl901. Epub 2006 Nov 27.
Multiple sequence alignments are often used to reveal functionally important residues within a protein family. They can be particularly useful for the identification of key residues that determine functional differences between protein subfamilies. We present a new entropy-based method, Sequence Harmony (SH) that accurately detects subfamily-specific positions from a multiple sequence alignment. The SH algorithm implements a novel formula, able to score compositional differences between subfamilies, without imposing conservation, in a simple manner on an intuitive scale. We compare our method with the most important published methods, i.e. AMAS, TreeDet and SDP-pred, using three well-studied protein families: the receptor-binding domain (MH2) of the Smad family of transcription factors, the Ras-superfamily of small GTPases and the MIP-family of integral membrane transporters. We demonstrate that SH accurately selects known functional sites with higher coverage than the other methods for these test-cases. This shows that compositional differences between protein subfamilies provide sufficient basis for identification of functional sites. In addition, SH selects a number of sites of unknown function that could be interesting candidates for further experimental investigation.
多序列比对常用于揭示蛋白质家族中功能重要的残基。它们对于鉴定决定蛋白质亚家族功能差异的关键残基尤其有用。我们提出了一种基于熵的新方法——序列和谐(SH),该方法能从多序列比对中准确检测亚家族特异性位点。SH算法实现了一个新颖的公式,能够以简单的方式在直观的尺度上对亚家族之间的组成差异进行评分,而无需强制保守性。我们使用三个研究充分的蛋白质家族:转录因子Smad家族的受体结合结构域(MH2)、小GTP酶的Ras超家族和完整膜转运蛋白的MIP家族,将我们的方法与已发表的最重要方法,即AMAS、TreeDet和SDP-pred进行比较。我们证明,对于这些测试案例,SH能准确选择已知的功能位点,且覆盖率高于其他方法。这表明蛋白质亚家族之间的组成差异为功能位点的鉴定提供了充分的基础。此外,SH还选择了一些功能未知的位点,这些位点可能是进一步实验研究的有趣候选对象。