Tsirogiannis Constantinos, Sandel Brody
MADALGO and Department of Bioscience, Aarhus University, Aarhus, Denmark.
PLoS One. 2016 Apr 7;11(4):e0151167. doi: 10.1371/journal.pone.0151167. eCollection 2016.
For many applications in ecology, it is important to examine the phylogenetic relations between two communities of species. More formally, let [Formula: see text] be a phylogenetic tree and let A and B be two samples of its tips, representing the examined communities. We want to compute a value that expresses the phylogenetic diversity between A and B in [Formula: see text]. There exist several measures that can do this; these are the so-called phylogenetic beta diversity (β-diversity) measures. Two popular measures of this kind are the Community Distance (CD) and the Common Branch Length (CBL). In most applications, it is not sufficient to compute the value of a beta diversity measure for two communities A and B; we also want to know if this value is relatively large or small compared to all possible pairs of communities in [Formula: see text] that have the same size. To decide this, the ideal approach is to compute a standardised index that involves the mean and the standard deviation of this measure among all pairs of species samples that have the same number of elements as A and B. However, no method exists for computing exactly and efficiently this index for CD and CBL. We present analytical expressions for computing the expectation and the standard deviation of CD and CBL. Based on these expressions, we describe efficient algorithms for computing the standardised indices of the two measures. Using standard algorithmic analysis, we provide guarantees on the theoretical efficiency of our algorithms. We implemented our algorithms and measured their efficiency in practice. Our implementations compute the standardised indices of CD and CBL in less than twenty seconds for a hundred pairs of samples on trees with 7 ⋅ 10(4) tips. Our implementations are available through the R package PhyloMeasures.
在生态学的许多应用中,研究两个物种群落之间的系统发育关系非常重要。更正式地说,设[公式:见原文]为一棵系统发育树,A和B为其末端的两个样本,分别代表所研究的群落。我们希望计算一个值,该值能表达[公式:见原文]中A和B之间的系统发育多样性。有几种方法可以做到这一点;这些就是所谓的系统发育β多样性(β -多样性)度量。这类中两种流行的度量是群落距离(CD)和共同分支长度(CBL)。在大多数应用中,仅计算两个群落A和B的β多样性度量值是不够的;我们还想知道与[公式:见原文]中所有具有相同大小的群落对相比,这个值是相对较大还是较小。为了确定这一点,理想的方法是计算一个标准化指数,该指数涉及在所有与A和B具有相同元素数量的物种样本对中该度量的均值和标准差。然而,不存在精确且高效地计算CD和CBL的这个指数的方法。我们给出了计算CD和CBL的期望和标准差的解析表达式。基于这些表达式,我们描述了计算这两种度量的标准化指数的高效算法。通过标准算法分析,我们为算法的理论效率提供了保证。我们实现了我们的算法并在实际中测量了它们的效率。对于具有7·10⁴个末端的树上的一百对样本,我们的实现能在不到二十秒的时间内计算出CD和CBL的标准化指数。我们的实现可通过R包PhyloMeasures获取。