Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution, Shanghai, China; Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China; National Demonstration Center for Experimental Fisheries Science Education (Shanghai Ocean University), China.
School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA 98105, USA.
Mol Phylogenet Evol. 2018 Nov;128:192-202. doi: 10.1016/j.ympev.2018.07.018. Epub 2018 Jul 20.
The use of genome-scale data to infer phylogenetic relationships has gained in popularity in recent years due to the progress made in target-gene capture and sequencing techniques. Data filtering, the approach of excluding data inconsistent with the model from analyses, presumably could alleviate problems caused by systematic errors in phylogenetic inference. Different data filtering criteria, such as those based on evolutionary rate and molecular clocklikeness as well as others have been proposed for selecting useful phylogenetic markers, yet few studies have tested these criteria using phylogenomic data. We developed a novel set of single-copy nuclear coding markers to capture thousands of target genes in gobioid fishes, a species-rich lineages of vertebrates, and tested the effects of data-filtering methods based on substitution rate and molecular clocklikeness while attempting to control for the compounding effects of missing data and variation in locus length. We found that molecular clocklikeness was a better predictor than overall substitution rate for phylogenetic usefulness of molecular markers in our study. In addition, when the 100 best ranked loci for our predictors were concatenated and analyzed using maximum likelihood, or combined in a coalescent-based species-tree analysis, the resulting trees showed a well-resolved topology of Gobioidei that mostly agrees with previous studies. However, trees generated from the 100 least clocklike frequently recovered conflicting, and in some cases clearly erroneous topologies with strong support, thus indicating strong systematic biases in those datasets. Collectively these results suggest that data filtering has the potential improve the performance of phylogenetic inference when using both a concatenation approach as well as methods that rely on input from individual gene trees (i.e. coalescent species-tree approaches), which may be preferred in scenarios where incomplete lineage sorting is likely to be an issue.
近年来,由于目标基因捕获和测序技术的进步,利用基因组规模的数据来推断系统发育关系的方法越来越受欢迎。数据过滤,即排除与模型不一致的数据的方法,可能可以减轻系统误差在系统发育推断中引起的问题。已经提出了不同的数据过滤标准,例如基于进化率和分子钟似然性以及其他标准的标准,用于选择有用的系统发育标记物,但很少有研究使用系统发育基因组数据测试这些标准。我们开发了一套新的单拷贝核编码标记物,用于捕获脊椎动物中种类繁多的鱼类( Gobioid 鱼类)中的数千个目标基因,并测试了基于替代率和分子钟似然性的数据过滤方法的效果,同时试图控制缺失数据和基因座长度变化的复合效应。我们发现,在我们的研究中,分子钟似然性是比总替代率更好的预测分子标记物系统发育有用性的指标。此外,当我们的预测指标中排名前 100 的最佳基因座被连接并使用最大似然法进行分析,或者在基于合并的种系树分析中结合时,所得的树显示了 Gobioidei 的良好分辨率拓扑结构,这与以前的研究基本一致。然而,从 100 个最不钟似的基因座生成的树经常恢复出冲突的拓扑结构,在某些情况下,这些拓扑结构明显错误,且具有强烈的支持,这表明这些数据集存在强烈的系统偏差。总的来说,这些结果表明,数据过滤具有改善使用连接方法以及依赖于单个基因树输入的方法(即合并种系树方法)的系统发育推断性能的潜力,在不完全谱系分选可能是一个问题的情况下,这些方法可能更受欢迎。