Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA.
Florida Museum of Natural History, University of Florida, 1659 Museum Road, Gainesville, FL 32611, USA.
Syst Biol. 2019 Mar 1;68(2):298-316. doi: 10.1093/sysbio/syy064.
Phylogenomic data sets are illuminating many areas of the Tree of Life. However, the large size of these data sets alone may be insufficient to resolve problematic nodes in the most rapid evolutionary radiations, because inferences in zones of extraordinarily low phylogenetic signal can be sensitive to the model and method of inference, as well as the information content of loci employed. We used a data set of $>$3950 ultraconserved element (UCE) loci from a classic mammalian radiation, ground-dwelling squirrels of the tribe Marmotini (Sciuridae: Xerinae), to assess sensitivity of phylogenetic estimates to varying per-locus information content across four different inference methods (RAxML, ASTRAL, NJst, and SVDquartets). Persistent discordance was found in topology and bootstrap support between concatenation- and coalescent-based inferences; among methods within the coalescent framework; and within all methods in response to different filtering scenarios. Contrary to some recent empirical UCE-based studies, filtering by information content did not promote complete among-method concordance. Nevertheless, filtering did improve concordance relative to randomly selected locus sets, largely via improved consistency of two-step summary methods (particularly NJst) under conditions of higher average per-locus variation (and thus increasing gene tree precision). The benefits of phylogenomic data set filtering are variable among classes of inference methods and across different evolutionary scenarios, reiterating the complexities of resolving rapid radiations, even with robust taxon and character sampling.
系统基因组数据集正在阐明生命之树的许多领域。然而,这些数据集的规模之大本身可能不足以解决进化辐射最快的节点问题,因为在具有极低系统发育信号的区域进行推断可能对推断模型和方法以及所使用的基因座信息含量敏感。我们使用了来自经典哺乳动物辐射(地松鼠科: Xerinae 族的地松鼠)的超过 3950 个超保守元件(UCE)基因座的数据集,以评估在四个不同推断方法(RAxML、ASTRAL、NJst 和 SVDquartets)中,每个基因座信息含量变化对系统发育估计的敏感性。在连接法和合并法推断之间,在合并框架内的方法之间,以及在所有方法中都发现了拓扑和自举支持的持久性分歧;在基于合并的所有方法中,对不同的过滤方案也存在分歧。与最近一些基于 UCE 的实证研究相反,按信息含量过滤并没有促进所有方法之间的完全一致性。然而,过滤确实相对于随机选择的基因座集提高了一致性,这主要是通过在较高的平均每个基因座变化条件下提高两步汇总方法的一致性(尤其是 NJst)来实现的(从而提高了基因树的精度)。过滤系统基因组数据集的好处在不同的推断方法类别和不同的进化情景中是不同的,这再次强调了即使在具有强大分类群和特征采样的情况下,解决快速辐射的复杂性。