Sanderson Jean, Sudoyo Herawati, Karafet Tatiana M, Hammer Michael F, Cox Murray P
Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand.
Eijkman Institute for Molecular Biology, Jakarta, Indonesia.
Genetics. 2015 Jun;200(2):469-81. doi: 10.1534/genetics.115.176842. Epub 2015 Apr 7.
Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions.
长期隔离种群之间的混合是许多物种基因组的一个决定性特征。混合基因组的镶嵌块结构可以提供有关过去接触事件的信息,包括混合的时间和程度。在这里,我们描述了一种改进的基于小波的技术,该技术能从观察到的基因组模式中更好地表征祖先块结构。首先将主成分分析应用于基因组数据以识别主要种群结构,然后进行小波分解以开发沿染色体的局部祖先信息的新表征。为了进行测试,该方法应用于来自印度尼西亚的人类全基因组基因型数据,以及在广泛的混合场景下使用基因组规模顺序合并模拟生成的虚拟遗传数据。使用近似贝叶斯计算框架推断混合时间,从而对混合时间及其相关的不确定性水平提供可靠估计。至关重要的是,我们证明了这种经过修订的小波方法(我们已作为R包adwave发布)比现有的基于小波的技术具有更高的统计功效,并且可用于解决广泛的混合问题。