Programa de Pós-graduação Ciências do Ambiente (CIAMB), Universidade Federal do Tocantins, Palmas, Tocantins, Brazil.
Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia.
Mol Biol Evol. 2018 Jul 1;35(7):1798-1811. doi: 10.1093/molbev/msy069.
Ultraconserved (UCEs) are popular markers for phylogenomic studies. They are relatively simple to collect from distantly-related organisms, and contain sufficient information to infer relationships at almost all taxonomic levels. Most studies of UCEs use partitioning to account for variation in rates and patterns of molecular evolution among sites, for example by estimating an independent model of molecular evolution for each UCE. However, rates and patterns of molecular evolution vary substantially within as well as between UCEs, suggesting that there may be opportunities to improve how UCEs are partitioned for phylogenetic inference. We propose and evaluate new partitioning methods for phylogenomic studies of UCEs: Sliding-Window Site Characteristics (SWSC), and UCE Site Position (UCESP). The first method uses site characteristics such as entropy, multinomial likelihood, and GC content to generate partitions that account for heterogeneity in rates and patterns of molecular evolution within each UCE. The second method groups together nucleotides that are found in similar physical locations within the UCEs. We examined the new methods with seven published data sets from a variety of taxa. We demonstrate the UCESP method generates partitions that are worse than other strategies used to partition UCE data sets (e.g., one partition per UCE). The SWSC method, particularly when based on site entropies, generates partitions that account for within-UCE heterogeneity and leads to large increases in the model fit. All of the methods, code, and data used in this study, are available from https://github.com/Tagliacollo/PartitionUCE. Simplified code for implementing the best method, the SWSC-EN, is available from https://github.com/Tagliacollo/PFinderUCE-SWSC-EN.
超保守区 (UCEs) 是系统发育基因组学研究中常用的标记物。它们相对容易从亲缘关系较远的生物中收集,并且包含足够的信息来推断几乎所有分类水平的关系。大多数 UCEs 的研究使用分区来解释位点之间分子进化速率和模式的变化,例如为每个 UCE 估计独立的分子进化模型。然而,UCEs 内部和之间的分子进化速率和模式变化很大,这表明可能有机会改进 UCE 用于系统发育推断的分区方式。我们提出并评估了用于 UCE 系统发育基因组学研究的新分区方法:滑动窗口位点特征 (SWSC) 和 UCE 位点位置 (UCESP)。第一种方法使用位点特征,如熵、多项似然和 GC 含量,生成分区,以解释每个 UCE 内部分子进化速率和模式的异质性。第二种方法将在 UCE 中物理位置相似的核苷酸组合在一起。我们使用来自各种分类群的七个已发表数据集来检查新方法。我们证明 UCESP 方法生成的分区比用于分区 UCE 数据集的其他策略 (例如,每个 UCE 一个分区) 差。SWSC 方法,特别是基于位点熵时,生成的分区可以解释 UCE 内部的异质性,并导致模型拟合度的大幅提高。本研究中使用的所有方法、代码和数据都可从 https://github.com/Tagliacollo/PartitionUCE 获得。实施最佳方法 SWSC-EN 的简化代码可从 https://github.com/Tagliacollo/PFinderUCE-SWSC-EN 获得。