Walsh H E, Kidd M G, Moum T, Friesen V L
Department of Biology, Queen's University, Kingston, Ontario, K7L 3N6, Canada.
Cell Biology Department, Institute of Medical Biology, University of Tromsø, N9037, Tromsø, Norway.
Evolution. 1999 Jun;53(3):932-937. doi: 10.1111/j.1558-5646.1999.tb05386.x.
Although phylogenetic hypotheses can provide insights into mechanisms of evolution, their utility is limited by our inability to differentiate simultaneous speciation events (hard polytomies) from rapid cladogenesis (soft polytomies). In the present paper, we tested the potential for statistical power analysis to differentiate between hard and soft polytomies in molecular phytogenies. Classical power analysis typically is used a priori to determine the sample size required to detect a particular effect size at a particular level of significance (a) with a certain power (1 - β). A posteriori, power analysis is used to infer whether failure to reject a null hypothesis results from lack of an effect or from insufficient data (i.e., low power). We adapted this approach to molecular data to infer whether polytomies result from simultaneous branching events or from insufficient sequence information. We then used this approach to determine the amount of sequence data (sample size) required to detect a positive branch length (effect size). A worked example is provided based on the auklets (Charadriiformes: Alcidae), a group of seabirds among which relationships are represented by a polytomy, despite analyses of over 3000 bp of sequence data. We demonstrate the calculation of effect sizes and sample sizes from sequence data using a normal curve test for difference of a proportion from an expected value and a t-test for a difference of a mean from an expected value. Power analyses indicated that the data for the auklets should be sufficient to differentiate speciation events that occurred at least 100,000 yr apart (the duration of the shortest glacial and interglacial events of the Pleistocene), 2.6 million years ago.
尽管系统发育假说能够为进化机制提供见解,但其效用受到限制,因为我们无法区分同时发生的物种形成事件(硬多歧分支)和快速分支进化(软多歧分支)。在本文中,我们测试了统计功效分析在分子系统发育中区分硬多歧分支和软多歧分支的潜力。经典的功效分析通常在事先使用,以确定在特定显著性水平(α)下,具有一定功效(1 - β)来检测特定效应大小所需的样本量。事后,功效分析用于推断未能拒绝零假设是由于缺乏效应还是数据不足(即低功效)。我们将这种方法应用于分子数据,以推断多歧分支是由同时发生的分支事件还是序列信息不足导致的。然后,我们使用这种方法来确定检测到正分支长度(效应大小)所需的序列数据量(样本量)。基于小海雀(鸻形目:海雀科)给出了一个实例,这是一群海鸟,尽管对超过3000 bp的序列数据进行了分析,但它们之间的关系仍由一个多歧分支表示。我们展示了使用比例与期望值差异的正态曲线检验和均值与期望值差异的t检验,从序列数据计算效应大小和样本大小。功效分析表明,小海雀的数据应足以区分至少在10万年前(更新世最短的冰川期和间冰期事件的持续时间)、即260万年前发生的物种形成事件。