Minh Bui Quang, Klaere Steffen, von Haeseler Arndt
Center for Integrative Bioinformatics, Vienna, Max F Perutz Laboratories, University of Vienna, Medical University of Vienna, Veterinary University of Vienna, Dr-Bohr-Gasse 9/6, A-1030, Vienna, Austria.
Syst Biol. 2006 Oct;55(5):769-73. doi: 10.1080/10635150600981604.
We consider a (phylogenetic) tree with n labeled leaves, the taxa, and a length for each branch in the tree. For any subset of k taxa, the phylogenetic diversity is defined as the sum of the branch-lengths of the minimal subtree connecting the taxa in the subset. We introduce two time-efficient algorithms (greedy and pruning) to compute a subset of size k with maximal phylogenetic diversity in O(n log k) and O[n + (n-k) log (n-k)] time, respectively. The greedy algorithm is an efficient implementation of the so-called greedy strategy (Steel, 2005; Pardi and Goldman, 2005), whereas the pruning algorithm provides an alternative description of the same problem. Both algorithms compute within seconds a subtree with maximal phylogenetic diversity for trees with 100,000 taxa or more.
我们考虑一棵具有(n)个带标签叶子(即分类单元)的(系统发育)树,并且树中每个分支都有一个长度。对于任意(k)个分类单元的子集,系统发育多样性定义为连接该子集中分类单元的最小子树的分支长度之和。我们引入了两种高效的算法(贪心算法和剪枝算法),分别在(O(n \log k))和(O[n + (n - k) \log (n - k)])时间内计算出具有最大系统发育多样性的大小为(k)的子集。贪心算法是所谓贪心策略的一种有效实现方式(Steel,2005;Pardi和Goldman,2005),而剪枝算法则为同一问题提供了另一种描述。对于具有100,000个或更多分类单元的树,这两种算法都能在数秒内计算出具有最大系统发育多样性的子树。