School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia.
Research School of Biology, Australian National University, Canberra, ACT 2601, Australia.
Syst Biol. 2018 May 1;67(3):400-412. doi: 10.1093/sysbio/syx076.
A fundamental challenge in resolving evolutionary relationships across the tree of life is to account for heterogeneity in the evolutionary signal across loci. Studies of marsupial mammals have demonstrated that this heterogeneity can be substantial, leaving considerable uncertainty in the evolutionary timescale and relationships within the group. Using simulations and a new phylogenomic data set comprising nucleotide sequences of 1550 loci from 18 of the 22 extant marsupial families, we demonstrate the power of a method for identifying clusters of loci that support different phylogenetic trees. We find two distinct clusters of loci, each providing an estimate of the species tree that matches previously proposed resolutions of the marsupial phylogeny. We also identify a well-supported placement for the enigmatic marsupial moles (Notoryctes) that contradicts previous molecular estimates but is consistent with morphological evidence. The pattern of gene-tree variation across tree-space is characterized by changes in information content, GC content, substitution-model adequacy, and signatures of purifying selection in the data. In a simulation study, we show that incomplete lineage sorting can explain the division of loci into the two tree-topology clusters, as found in our phylogenomic analysis of marsupials. We also demonstrate the potential benefits of minimizing uncertainty from phylogenetic conflict for molecular dating. Our analyses reveal that Australasian marsupials appeared in the early Paleocene, whereas the diversification of present-day families occurred primarily during the late Eocene and early Oligocene. Our methods provide an intuitive framework for improving the accuracy and precision of phylogenetic inference and molecular dating using genome-scale data.
解决生命之树中进化关系的一个基本挑战是要解释跨基因座进化信号的异质性。有研究表明,有袋哺乳动物的这种异质性可能非常大,这使得对该类群的进化时间尺度和关系存在很大的不确定性。我们使用模拟和一个新的基因组数据集,该数据集包含了 22 个现存有袋目哺乳动物家族中的 18 个家族的 1550 个基因座的核苷酸序列,证明了一种识别支持不同系统发育树的基因座聚类的方法的有效性。我们发现了两个不同的基因座聚类,每个聚类都提供了一个与以前提出的有袋动物系统发育解决方案相匹配的物种树估计。我们还为神秘的有袋目鼹鼠(Notoryctes)找到了一个支持度很高的位置,这与之前的分子估计结果相矛盾,但与形态学证据一致。基因树在树空间中的变异模式的特征是信息含量、GC 含量、替代模型充分性以及数据中纯化选择的特征发生变化。在模拟研究中,我们表明不完全谱系分选可以解释我们对有袋目哺乳动物的基因组分析中基因座划分为两个树拓扑聚类的原因。我们还展示了最小化系统发育冲突不确定性对分子定年的潜在好处。我们的分析表明,澳大利亚有袋类动物出现在古新世早期,而现存家族的多样化主要发生在始新世晚期和渐新世早期。我们的方法为使用基因组规模的数据提高系统发育推断和分子定年的准确性和精度提供了直观的框架。