Blomberg Simon P, Garland Theodore, Ives Anthony R
Department of Biology, University of California, Riverside, California 92521, USA.
Evolution. 2003 Apr;57(4):717-45. doi: 10.1111/j.0014-3820.2003.tb00285.x.
The primary rationale for the use of phylogenetically based statistical methods is that phylogenetic signal, the tendency for related species to resemble each other, is ubiquitous. Whether this assertion is true for a given trait in a given lineage is an empirical question, but general tools for detecting and quantifying phylogenetic signal are inadequately developed. We present new methods for continuous-valued characters that can be implemented with either phylogenetically independent contrasts or generalized least-squares models. First, a simple randomization procedure allows one to test the null hypothesis of no pattern of similarity among relatives. The test demonstrates correct Type I error rate at a nominal alpha = 0.05 and good power (0.8) for simulated datasets with 20 or more species. Second, we derive a descriptive statistic, K, which allows valid comparisons of the amount of phylogenetic signal across traits and trees. Third, we provide two biologically motivated branch-length transformations, one based on the Ornstein-Uhlenbeck (OU) model of stabilizing selection, the other based on a new model in which character evolution can accelerate or decelerate (ACDC) in rate (e.g., as may occur during or after an adaptive radiation). Maximum likelihood estimation of the OU (d) and ACDC (g) parameters can serve as tests for phylogenetic signal because an estimate of d or g near zero implies that a phylogeny with little hierarchical structure (a star) offers a good fit to the data. Transformations that improve the fit of a tree to comparative data will increase power to detect phylogenetic signal and may also be preferable for further comparative analyses, such as of correlated character evolution. Application of the methods to data from the literature revealed that, for trees with 20 or more species, 92% of traits exhibited significant phylogenetic signal (randomization test), including behavioral and ecological ones that are thought to be relatively evolutionarily malleable (e.g., highly adaptive) and/or subject to relatively strong environmental (nongenetic) effects or high levels of measurement error. Irrespective of sample size, most traits (but not body size, on average) showed less signal than expected given the topology, branch lengths, and a Brownian motion model of evolution (i.e., K was less than one), which may be attributed to adaptation and/or measurement error in the broad sense (including errors in estimates of phenotypes, branch lengths, and topology). Analysis of variance of log K for all 121 traits (from 35 trees) indicated that behavioral traits exhibit lower signal than body size, morphological, life-history, or physiological traits. In addition, physiological traits (corrected for body size) showed less signal than did body size itself. For trees with 20 or more species, the estimated OU (25% of traits) and/or ACDC (40%) transformation parameter differed significantly from both zero and unity, indicating that a hierarchical tree with less (or occasionally more) structure than the original better fit the data and so could be preferred for comparative analyses.
使用基于系统发育的统计方法的主要理由是,系统发育信号,即相关物种彼此相似的趋势,是普遍存在的。对于给定谱系中的特定性状,这一断言是否正确是一个实证问题,但用于检测和量化系统发育信号的通用工具尚未得到充分发展。我们提出了适用于连续值性状的新方法,这些方法可以通过系统发育独立对比或广义最小二乘模型来实现。首先,一个简单的随机化程序允许人们检验亲属之间不存在相似模式的零假设。该检验在名义显著性水平α = 0.05时显示出正确的I型错误率,并且对于具有20个或更多物种的模拟数据集具有良好的功效(0.8)。其次,我们推导出一个描述性统计量K,它允许对不同性状和树之间的系统发育信号量进行有效的比较。第三,我们提供了两种基于生物学动机的分支长度变换,一种基于稳定选择的奥恩斯坦 - 乌伦贝克(OU)模型,另一种基于一种新模型,其中性状进化在速率上可以加速或减速(ACDC)(例如,在适应性辐射期间或之后可能发生的情况)。OU(d)和ACDC(g)参数的最大似然估计可以作为系统发育信号的检验,因为d或g接近零的估计意味着几乎没有层次结构的系统发育(星型)能够很好地拟合数据。能够改善树与比较数据拟合度的变换将提高检测系统发育信号的功效,并且对于进一步诸如相关性状进化的比较分析可能也是更可取的。将这些方法应用于文献数据表明,对于具有20个或更多物种的树,92%的性状表现出显著的系统发育信号(随机化检验),包括那些被认为在进化上相对具有可塑性(例如,高度适应性)和/或受到相对较强的环境(非遗传)影响或测量误差水平较高的行为和生态性状。无论样本大小如何,大多数性状(但平均而言不包括体型)显示出的信号低于给定拓扑结构、分支长度和布朗运动进化模型所预期的信号(即K小于1),这可能归因于广义上的适应和/或测量误差(包括表型、分支长度和拓扑结构估计中的误差)。对所有121个性状(来自35棵树)的对数K进行方差分析表明,行为性状显示出的信号低于体型、形态、生活史或生理性状。此外,生理性状(经体型校正后)显示出的信号低于体型本身。对于具有20个或更多物种的树,估计的OU(25%的性状)和/或ACDC(40%)变换参数与零和一都有显著差异,这表明具有比原始树更少(或偶尔更多)结构的层次树能更好地拟合数据,因此可能更适合用于比较分析。