Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK 99709, USA.
Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.
Syst Biol. 2022 Jun 16;71(4):929-942. doi: 10.1093/sysbio/syab008.
A simple graphical device, the simplex plot of quartet concordance factors, is introduced to aid in the exploration of a collection of gene trees on a common set of taxa. A single plot summarizes all gene tree discord and allows for visual comparison to the expected discord from the multispecies coalescent model (MSC) of incomplete lineage sorting on a species tree. A formal statistical procedure is described that can quantify the deviation from expectation for each subset of four taxa, suggesting when the data are not in accord with the MSC, and thus that either gene tree inference error is substantial or a more complex model such as that on a network may be required. If the collection of gene trees is in accord with the MSC, the plots reveal when substantial incomplete lineage sorting is present. Applications to both simulated and empirical multilocus data sets illustrate the insights provided. [Gene tree discordance; hypothesis test; multispecies coalescent model; quartet concordance factor; simplex plot; species tree].
引入了一种简单的图形设备,即四分体一致因子的单纯形图,以帮助探索一组常见分类单元上的基因树。单个图总结了所有基因树分歧,并允许与物种树上不完全谱系分选的多物种合并模型 (MSC) 的预期分歧进行直观比较。描述了一种正式的统计程序,可以量化每个四个分类单元子集的偏离预期的程度,这表明数据与 MSC 不一致,因此要么基因树推断错误很大,要么需要更复杂的模型,例如网络上的模型。如果基因树的集合与 MSC 一致,则这些图揭示了存在大量不完全谱系分选的情况。对模拟和经验多基因数据集的应用说明了所提供的见解。[基因树分歧;假设检验;多物种合并模型;四分体一致因子;单纯形图;种系树]。