Allman Elizabeth S, Kubatko Laura S, Rhodes John A
Department of Mathematics and Statistics, PO Box 756660, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA.
Department of Statistics and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA.
Syst Biol. 2017 Jul 1;66(4):620-636. doi: 10.1093/sysbio/syw103.
Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data become more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and chromosomal inversion are expected to result in changes in the underlying gene trees along a chromosome, while changes in selective pressure and mutational rates for different genomic regions may lead to shifts in the underlying mutational process. We propose the split score as a general method for quantifying support for a particular phylogenetic relationship within a genomic data set. Because the split score is based on algebraic properties of a matrix of site pattern frequencies, it can be rapidly computed, even for data sets that are large in the number of taxa and/or in the length of the alignment, providing an advantage over other methods (e.g., maximum likelihood) that are often used to assess such support. Using simulation, we explore the properties of the split score, including its dependence on sequence length, branch length, size of a split and its ability to detect true splits in the underlying tree. Using a sliding window analysis, we show that split scores can be used to detect changes in the underlying evolutionary process for genome-scale data from primates, mosquitoes, and viruses in a computationally efficient manner. Computation of the split score has been implemented in the software package SplitSup.
随着全基因组数据越来越广泛可得,检测沿染色体的进化过程中的变异变得愈发重要。例如,不完全谱系分选、水平基因转移和染色体倒位等因素预计会导致沿染色体的基础基因树发生变化,而不同基因组区域的选择压力和突变率的变化可能会导致基础突变过程发生偏移。我们提出分裂得分作为一种通用方法,用于量化基因组数据集中对特定系统发育关系的支持度。由于分裂得分基于位点模式频率矩阵的代数性质,即使对于分类单元数量众多和/或比对长度很长的数据集,它也能快速计算,这比常用于评估此类支持度的其他方法(如最大似然法)具有优势。通过模拟,我们探究了分裂得分的性质,包括其对序列长度、分支长度、分裂大小的依赖性以及检测基础树中真实分裂的能力。使用滑动窗口窗口分析,我们表明分裂得分可用于以计算高效的方式检测来自灵长类动物、蚊子和病毒的基因组规模数据的基础进化过程中的变化。分裂得分的计算已在软件包SplitSup中实现。