Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5.
Bioinformatics. 2011 Oct 1;27(19):2655-63. doi: 10.1093/bioinformatics/btr470. Epub 2011 Aug 11.
To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples.
We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions.
http://rogerlab.biochem.dal.ca/Software
Supplementary data are available at Bioinformatics online.
为了了解蛋白质家族中分子功能的演变,识别那些导致功能分歧的氨基酸残基非常重要;即蛋白质家族中那些影响辅助因子、蛋白质或底物结合偏好、亲和力、催化、灵活性或折叠的位点。I 型功能分歧 (FD) 是由于蛋白质亚家族之间一个位点的保守性(进化速率)发生变化而导致的,而 II 型 FD 则是由于不同氨基酸化学性质的偏好发生了转变。已经开发了多种从系统发育和信息论角度识别蛋白质亚家族中这两种类型位点的方法。然而,这些方法的性能评估通常依赖于少数几个特征良好的生物学数据集或对单个生物学示例的分析。虽然许多真正具有功能分歧的位点(真阳性)的实验验证可能相对简单,但确定特定位点是否不导致功能分歧(即假阳性和真阴性)要困难得多,导致嘈杂的“黄金标准”示例。
我们描述了一种新颖的基于系统发育的功能分歧分类器 FunDi。与以前的方法不同,FunDi 使用统一的基于混合模型的方法来检测 I 型和 II 型 FD。为了评估 FunDi 相对于其他方法的总体分类性能,我们引入了两种模拟功能分歧数据集的方法。我们发现,在各种模拟条件下,FunDi 方法的性能优于其他几种预测器。
http://rogerlab.biochem.dal.ca/Software
补充数据可在 Bioinformatics 在线获得。