Bielawski Joseph P
Department of Biology, Department of Mathematics & Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
Curr Protoc Mol Biol. 2013 Jan;Chapter 19:Unit 19.1.. doi: 10.1002/0471142727.mb1901s101.
The field of molecular evolution, which includes genome evolution, is devoted to finding variation within and between groups of organisms and explaining the processes responsible for generating this variation. Many DNA changes are believed to have little to no functional effect, and a neutral process will best explain their evolution. Thus, a central task is to discover which changes had positive fitness consequences and were subject to Darwinian natural selection during the course of evolution. Due the size and complexity of modern molecular datasets, the field has come to rely extensively on statistical modeling techniques to meet this analytical challenge. For DNA sequences that encode proteins, one of the most powerful approaches is to employ a statistical model of codon evolution. This unit provides a general introduction to the practice of modeling codon evolution using the statistical framework of maximum likelihood. Four real-data analysis activities are used to illustrate the principles of parameter estimation, robustness, hypothesis testing, and site classification. Each activity includes an explicit analytical protocol based on programs provided by the Phylogenetic Analysis by Maximum Likelihood (PAML) package.
分子进化领域,包括基因组进化,致力于发现生物体群体内部和群体之间的变异,并解释产生这种变异的过程。许多DNA变化被认为几乎没有或没有功能影响,一个中性过程将最能解释它们的进化。因此,一项核心任务是发现哪些变化在进化过程中具有积极的适应性后果,并受到达尔文自然选择的影响。由于现代分子数据集的规模和复杂性,该领域已广泛依赖统计建模技术来应对这一分析挑战。对于编码蛋白质的DNA序列,最强大的方法之一是采用密码子进化的统计模型。本单元将使用最大似然的统计框架对密码子进化建模的实践进行总体介绍。四个实际数据分析活动用于说明参数估计、稳健性、假设检验和位点分类的原理。每个活动都包括一个基于最大似然系统发育分析(PAML)软件包提供的程序的明确分析方案。