Pollock D D, Eisen J A, Doggett N A, Cummings M P
Theoretical Biology and Biophysics, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
Mol Biol Evol. 2000 Dec;17(12):1776-88. doi: 10.1093/oxfordjournals.molbev.a026278.
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.
比较分析是理解生物学中多样而复杂的系统最有力的方法之一,但它常常受到缺乏全面分类取样的限制。尽管最近强大的基因组技术得到了发展,能够大量产生序列数据(见证最近完成的人类基因组初稿),但进化研究的开展方式相对变化不大。将基因组方法应用于进化生物学是一项挑战,部分原因是来自不同生物体的基因片段是分别处理的,需要单独进行纯化、克隆和测序。我们认为,为进化生物学收集基因组规模数据集(即进化基因组学)的一种可行方法可能是在克隆和测序之前对DNA样本进行合并,然后通过计算重建原始序列。这种方法将使基因组计划开发的自动化方案的全部益处得以实现;对于目标基因组和基因组区域,分类取样水平可以轻松增加到数千个。这个水平的序列多样性将极大地提高系统发育推断的质量和准确性,以及比较进化研究的准确性和分辨率。特别是,将有可能在恒定的结构和功能限制(即位点特异性替代概率)的背景下准确估计正常进化,同时准确估计进化模式的变化,包括位点间的成对协同进化、适应性爆发和选择限制的变化。然后,这些估计可用于理解和预测蛋白质结构和功能对序列进化的影响,并预测蛋白质结构、功能和功能分化的未知细节。为了证明这些想法的实用性以及对功能基因组分析的潜在益处,我们描述了一个正在进行的试点项目,该项目旨在同时对大量脊椎动物线粒体基因组进行测序。