Department of Mathematics and Statistics, University of New Mexico, United States.
Mol Phylogenet Evol. 2021 Aug;161:107162. doi: 10.1016/j.ympev.2021.107162. Epub 2021 Apr 6.
Species trees that can generate a nonmatching gene tree topology that is more probable than the topology matching the species tree are said to be in an anomaly zone. We introduce some heuristic approaches to infer whether species trees are in anomaly zones when it is difficult or impossible to compute the entire distribution of gene tree topologies. Here, probabilities of unrooted, unranked, and ranked gene tree topologies under the multispecies coalescent are used. A ranked tree can be viewed as an unranked tree with a temporal ordering of its internal nodes. Overall, considering probabilities of unrooted or unranked gene tree topologies within one nearest neighbor interchange from the species tree topology is a reasonable heuristic to infer the existence of anomalous unrooted or unranked gene trees, respectively. We investigated a test proposed by Linkem et al. (2016) which classifies a species tree as being in an unranked anomaly zone if there is a subset of four taxa in an unranked anomaly zone. We find this test to have high true positive rates, but it can also have high false positive rates. For ranked trees, because at least one of the most probable ranked gene tree topologies must have the same unranked topology as the species tree, we propose to use only those ranked gene trees that have topologies that match the unranked species tree topology. We find that the probability that the species tree is in unrooted and unranked anomaly zones tends to increase with the speciation rate, and the probability of all three types of anomaly zones increases rapidly with the number of taxa. We find that probabilities that species trees are in an anomaly zone can be quite high for moderately high speciation rates.
当难以或不可能计算基因树拓扑的整个分布时,我们引入了一些启发式方法来推断物种树是否处于异常区。这里使用了多物种合并下无根、无等级和有等级基因树拓扑的概率。有等级的树可以看作是一个无等级的树,其内部节点有时间顺序。总的来说,考虑物种树拓扑最近邻交换之一的无根或无等级基因树拓扑的概率是推断异常无根或无等级基因树存在的合理启发式方法,分别。我们研究了 Linkem 等人提出的测试。(2016) 如果在无等级异常区中存在四个分类群的子集,则将物种树分类为无等级异常区。我们发现该测试具有高真阳性率,但也可能具有高假阳性率。对于有等级的树,由于至少一个最可能的有等级基因树拓扑必须具有与物种树相同的无等级拓扑,因此我们建议仅使用那些具有与无等级物种树拓扑匹配的拓扑的有等级基因树。我们发现,物种树处于无根和无等级异常区的概率随着物种形成率的增加而增加,所有三种异常区的概率随着分类群数量的增加而迅速增加。我们发现,对于中等高的物种形成率,物种树处于异常区的概率可能相当高。