Department of Computer Science, Iowa State University, Ames, IA 50010, United States.
Virus and Prion Research Unit, National Animal Disease Center, USDA-ARS, Ames, IA 50010, United States.
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i177-i184. doi: 10.1093/bioinformatics/btad263.
The classic quantitative measure of phylogenetic diversity (PD) has been used to address problems in conservation biology, microbial ecology, and evolutionary biology. PD is the minimum total length of the branches in a phylogeny required to cover a specified set of taxa on the phylogeny. A general goal in the application of PD has been identifying a set of taxa of size k that maximize PD on a given phylogeny; this has been mirrored in active research to develop efficient algorithms for the problem. Other descriptive statistics, such as the minimum PD, average PD, and standard deviation of PD, can provide invaluable insight into the distribution of PD across a phylogeny (relative to a fixed value of k). However, there has been limited or no research on computing these statistics, especially when required for each clade in a phylogeny, enabling direct comparisons of PD between clades. We introduce efficient algorithms for computing PD and the associated descriptive statistics for a given phylogeny and each of its clades. In simulation studies, we demonstrate the ability of our algorithms to analyze large-scale phylogenies with applications in ecology and evolutionary biology. The software is available at https://github.com/flu-crew/PD_stats.
经典的系统发育多样性(PD)定量度量已被用于解决保护生物学、微生物生态学和进化生物学中的问题。PD 是系统发育中覆盖系统发育上指定分类群所需的最小总分支长度。PD 在应用中的一个一般目标是确定一组大小为 k 的分类群,在给定的系统发育中最大限度地提高 PD;这与积极研究开发该问题的有效算法是一致的。其他描述性统计量,如最小 PD、平均 PD 和 PD 的标准差,可以为 PD 在系统发育中的分布(相对于固定的 k 值)提供宝贵的见解。然而,对于计算这些统计量的研究很少或没有,特别是对于系统发育中的每个分支,这使得在分支之间直接比较 PD 成为可能。我们为给定的系统发育及其每个分支引入了计算 PD 和相关描述性统计量的有效算法。在模拟研究中,我们展示了我们的算法在生态学和进化生物学中的应用分析大规模系统发育的能力。该软件可在 https://github.com/flu-crew/PD_stats 上获得。