Zhou Tong, Enyeart Peter J, Wilke Claus O
Center for Computational Biology and Bioinformatics, Section of Integrative Biology, University of Texas at Austin, Austin, Texas, United States of America.
PLoS One. 2008;3(11):e3765. doi: 10.1371/journal.pone.0003765. Epub 2008 Nov 19.
Positive selection for protein function can lead to multiple mutations within a small stretch of DNA, i.e., to a cluster of mutations. Recently, Wagner proposed a method to detect such mutation clusters. His method, however, did not take into account that residues with high solvent accessibility are inherently more variable than residues with low solvent accessibility. Here, we propose a new algorithm to detect clustered evolution. Our algorithm controls for different substitution probabilities at buried and exposed sites in the tertiary protein structure, and uses random permutations to calculate accurate P values for inferred clusters. We apply the algorithm to genomes of bacteria, fly, and mammals, and find several clusters of mutations in functionally important regions of proteins. Surprisingly, clustered evolution is a relatively rare phenomenon. Only between 2% and 10% of the genes we analyze contain a statistically significant mutation cluster. We also find that not controlling for solvent accessibility leads to an excess of clusters in terminal and solvent-exposed regions of proteins. Our algorithm provides a novel method to identify functionally relevant divergence between groups of species. Moreover, it could also be useful to detect artifacts in automatically assembled genomes.
蛋白质功能的正向选择可导致一小段DNA内出现多个突变,即突变簇。最近,瓦格纳提出了一种检测此类突变簇的方法。然而,他的方法没有考虑到溶剂可及性高的残基本质上比溶剂可及性低的残基更具变异性。在此,我们提出一种检测成簇进化的新算法。我们的算法控制三级蛋白质结构中埋藏位点和暴露位点不同的替换概率,并使用随机排列来计算推断簇的准确P值。我们将该算法应用于细菌、果蝇和哺乳动物的基因组,在蛋白质的功能重要区域发现了几个突变簇。令人惊讶的是,成簇进化是一种相对罕见的现象。我们分析的基因中只有2%到10%包含统计学上显著的突变簇。我们还发现,不控制溶剂可及性会导致蛋白质末端和溶剂暴露区域出现过多的簇。我们的算法提供了一种识别物种组之间功能相关差异的新方法。此外,它在检测自动组装基因组中的人为错误方面也可能有用。