Chen K, Durand D, Farach-Colton M
Department of Electrical Engineering and Computer Science, University of California, Berkeley 94720, USA.
J Comput Biol. 2000;7(3-4):429-47. doi: 10.1089/106652700750050871.
Large scale gene duplication is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important role in the evolution of the maize, yeast, and vertebrate genomes. The use of evolutionary trees to analyze the history of gene duplication and estimate duplication times provides a powerful tool for studying this process. Many studies in the molecular evolution literature have used this approach on small data sets, using analyses performed by hand. The rapid growth of genetic sequence data will soon allow similar studies on a genomic scale, but such studies will be limited unless the analysis can be automated. Even existing data sets admit alternative hypotheses that would be too tedious to consider without automation. In this paper, we describe a program called NOTUNG that facilitates large scale analysis, using both rooted and unrooted trees. When tested on trees analyzed in the literature, NOTUNG consistently yielded results that agree with the assessments in the original publications. Thus, NOTUNG provides a basic building block for inferring duplication dates from gene trees automatically and can also be used as an exploratory analysis tool for evaluating alternative hypotheses.
大规模基因复制是推动基因功能创新进化的主要力量。全基因组复制被广泛认为在玉米、酵母和脊椎动物基因组的进化中发挥了重要作用。利用进化树来分析基因复制的历史并估计复制时间,为研究这一过程提供了一个强大的工具。分子进化文献中的许多研究都在小数据集上使用了这种方法,采用手工进行分析。遗传序列数据的快速增长很快将使类似的基因组规模研究成为可能,但除非分析能够自动化,否则此类研究将受到限制。即使是现有的数据集也存在一些替代假设,如果没有自动化,考虑这些假设会非常繁琐。在本文中,我们描述了一个名为NOTUNG的程序,它利用有根树和无根树来促进大规模分析。当在文献中分析的树上进行测试时,NOTUNG始终产生与原始出版物中的评估一致的结果。因此,NOTUNG为从基因树自动推断复制日期提供了一个基本构建块,也可作为评估替代假设的探索性分析工具。