Gu X
Department of Zoology/Genetics, Center for Bioinformatics and Biological Statistics, Iowa State University, Ames 50011, USA.
Mol Biol Evol. 2001 Apr;18(4):453-64. doi: 10.1093/oxfordjournals.molbev.a003824.
According to the observed alignment pattern (i.e., amino acid configuration), we studied two basic types of functional divergence of a protein family. Type I functional divergence after gene duplication results in altered functional constraints (i.e., different evolutionary rate) between duplicate genes, whereas type II results in no altered functional constraints but radical change in amino acid property between them (e.g., charge, hydrophobicity, etc.). Two statistical approaches, i.e., the subtree likelihood and the whole-tree likelihood, were developed for estimating the coefficients of (type I or type II) functional divergence. Numerical algorithms for obtaining maximum-likelihood estimates are also provided. Moreover, a posterior-based site-specific profile is implemented to predict critical amino acid residues that are responsible for type I and/or type II functional divergence after gene duplication. We compared the current likelihood with a fast method developed previously by examples; both show similar results. For handling altered functional constraints (type I functional divergence) in the large gene family with many member genes (clusters), which appears to be a normal case in postgenomics, the subtree likelihood provides a solution that is computationally feasible and robust against the uncertainty of the phylogeny. The cost of this feasibility is the approximation when frequencies of amino acids are very skewed. The potential bias and correction are discussed.
根据观察到的比对模式(即氨基酸构型),我们研究了蛋白质家族功能分化的两种基本类型。基因复制后的I型功能分化导致复制基因之间功能限制的改变(即不同的进化速率),而II型功能分化则导致功能限制不变,但它们之间氨基酸性质发生了根本性变化(例如电荷、疏水性等)。我们开发了两种统计方法,即子树似然法和整树似然法,用于估计(I型或II型)功能分化系数。还提供了用于获得最大似然估计的数值算法。此外,还实施了基于后验的位点特异性概况,以预测基因复制后导致I型和/或II型功能分化的关键氨基酸残基。我们通过实例将当前的似然法与之前开发的一种快速方法进行了比较;两者结果相似。对于处理具有许多成员基因(簇)的大型基因家族中改变的功能限制(I型功能分化),这在后基因组学中似乎是一种常见情况,子树似然法提供了一种计算上可行且对系统发育不确定性具有鲁棒性的解决方案。这种可行性的代价是当氨基酸频率非常不均衡时的近似值。文中讨论了潜在的偏差及校正方法。