Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4H7.
Syst Biol. 2012 Oct;61(5):811-21. doi: 10.1093/sysbio/sys033. Epub 2012 Feb 15.
We illustrate how recently developed large sequence-length approximations to probabilities of correct phylogenetic reconstruction for maximum likelihood estimation can be used to evaluate experimental design strategies. The specific criterion of interest is the probability of correctly resolving an a priori defined split of interest in a phylogenetic tree. Design strategies considered include increased taxon sampling and increasing sequence length. Our analyses of specific examples strongly suggest that it is better to sample taxa that connect as close as possible to the split of interest. Assuming this can be done, these examples suggest it is better to sample additional taxa than to add a comparable number of sites for the existing taxa. If the rates of evolution in the added taxa are slow, it is better to choose taxa connecting to a long edge, but if rates are comparable to a sister lineage, it is not necessarily the best strategy to sample taxa connected to a long edge. We also examined deleting taxa while increasing the number of sites. Although deleting a small number of taxa distant from the split of interest can be beneficial, deleting too many or making poor choices as to what should be deleted can lead to smaller probabilities of correct reconstruction than for the original sequence data.
我们展示了如何使用最近开发的用于最大似然估计的正确系统发育重建概率的大序列长度逼近来评估实验设计策略。感兴趣的特定标准是正确解决系统发育树中预先定义的感兴趣分支的概率。考虑的设计策略包括增加分类群采样和增加序列长度。我们对具体示例的分析强烈表明,最好选择尽可能接近感兴趣分支的分类群进行采样。假设可以做到这一点,这些示例表明,添加额外的分类群比为现有分类群添加可比数量的站点更好。如果添加的分类群的进化率较慢,则选择连接到长边缘的分类群更好,但如果与姐妹谱系的速率相当,则选择连接到长边缘的分类群不一定是最佳策略。我们还检查了在增加站点数量的同时删除分类群。虽然删除远离感兴趣分支的少数分类群可能会有所帮助,但删除过多或选择不当的分类群会导致正确重建的概率比原始序列数据小。