Simmons Mark P, Sloan Daniel B, Gatesy John
Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Mol Phylogenet Evol. 2016 Apr;97:76-89. doi: 10.1016/j.ympev.2015.12.013. Epub 2016 Jan 6.
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
基因树估计误差是系统发育推断的溯祖方法的一个主要问题。我们对八项关于古代谱系的实证研究进行了抽样,这些研究涉及不同数量的分类单元和基因,原始作者应用了一种或多种溯祖方法。我们发现,基因树之间的平均成对一致性在不同研究之间差异很大,而且在一项研究中也常常如此。我们建议,在数据集中展示基因树之间的成对一致性图应作为实证溯祖研究的标准做法,以便读者能够轻松评估基因树之间不一致的程度和分布。基于ASTRAL的溯祖分析在内部一致性(用所有基因的完整数据集对基因子样本进行分析之间的一致性)和与基于串联的拓扑结构的一致性方面,通常优于MP-EST和STAR。我们评估了对平均而言与其他基因树更一致的基因树进行子抽样的方法,以此作为减少溯祖分析中基因树估计误差所造成假象的一种手段。我们认为,这种方法非常适合于检验基因树估计误差是否是基于串联和基于溯祖的结果之间不一致的主要原因,适合于调和基于不同溯祖方法的相互冲突的系统发育结果,以及适合于识别受假象影响的基因,然后可以针对这些基因进行相互验证。我们提供了一些脚本,这些脚本可以自动计算成对基因树不一致性并对树进行子抽样,同时考虑到基因之间不同的分类单元抽样情况。最后,我们主张,对于应用MP-EST的实证溯祖研究,应将多次树搜索重复作为标准做法来实施。