Abhiman Saraswathi, Daub Carsten O, Sonnhammer Erik L L
Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, Sweden.
Mol Biol Evol. 2006 Jul;23(7):1406-13. doi: 10.1093/molbev/msl002. Epub 2006 May 3.
Protein families typically embody a range of related functions and may thus be decomposed into subfamilies with, for example, distinct substrate specificities. Detection of functionally divergent subfamilies is possible by methods for recognizing branches of adaptive evolution in a gene tree. As the number of genome sequences is growing rapidly, it is highly desirable to automatically detect subfamily function divergence. To this end, we here introduce a method for large-scale prediction of function divergence within protein families. It is called the alpha shift measure (ASM) as it is based on detecting a shift in the shape parameter (alpha [alpha]) of the substitution rate gamma distribution. Four different methods for estimating alpha were investigated. We benchmarked the accuracy of ASM using function annotation from Enzyme Commission numbers within Pfam protein families divided into subfamilies by the automatic tree-based method BETE. In a test using 563 subfamily pairs in 162 families, ASM outperformed functional site-based methods using rate or conservation shifting (rate shift measure [RSM] and conservation shift measure [CSM]). The best results were obtained using the "GZ-Gamma" method for estimating alpha. By combining ASM with RSM and CSM using linear discriminant analysis, the prediction accuracy was further improved.
蛋白质家族通常体现一系列相关功能,因此可以分解为具有不同底物特异性等的亚家族。通过识别基因树中适应性进化分支的方法能够检测功能不同的亚家族。随着基因组序列数量迅速增长,自动检测亚家族功能差异非常必要。为此,我们在此介绍一种大规模预测蛋白质家族内功能差异的方法。它被称为α位移度量(ASM),因为它基于检测替换率伽马分布的形状参数(α[阿尔法])的位移。研究了四种估计α的不同方法。我们使用基于自动树的方法BETE划分为亚家族的Pfam蛋白质家族内酶委员会编号的功能注释,对ASM的准确性进行了基准测试。在对162个家族中的563个亚家族对进行的测试中,ASM优于使用速率或保守性位移的基于功能位点的方法(速率位移度量[RSM]和保守性位移度量[CSM])。使用“GZ - 伽马”方法估计α可获得最佳结果。通过使用线性判别分析将ASM与RSM和CSM相结合,预测准确性进一步提高。