Tsai Ming-Chi, Blelloch Guy, Ravi R, Schwartz Russell
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct;10(5):1137-49. doi: 10.1109/tcbb.2013.98.
Detecting and quantifying the timing and the genetic contributions of parental populations to a hybrid population is an important but challenging problem in reconstructing evolutionary histories from genetic variation data. With the advent of high throughput genotyping technologies, new methods suitable for large-scale data are especially needed. Furthermore, existing methods typically assume the assignment of individuals into subpopulations is known, when that itself is a difficult problem often unresolved for real data. Here, we propose a novel method that combines prior work for inferring non reticulate population structures with an MCMC scheme for sampling over admixture scenarios to both identify population assignments and learn divergence times and admixture proportions for those populations using genome-scale admixed genetic variation data. We validated our method using coalescent simulations and a collection of real bovine and human variation data. On simulated sequences, our methods show better accuracy and faster run time than leading competitive methods in estimating admixture fractions and divergence times. Analysis on the real data further shows our methods to be effective at matching our best current knowledge about the relevant populations.
从遗传变异数据重建进化历史时,检测并量化亲本群体对杂交群体的时间和遗传贡献是一个重要但具有挑战性的问题。随着高通量基因分型技术的出现,尤其需要适用于大规模数据的新方法。此外,现有方法通常假定个体到亚群体的分配是已知的,而这本身就是一个难题,对于实际数据往往尚未解决。在此,我们提出一种新颖的方法,该方法将用于推断非网状群体结构的先前工作与用于在混合场景上进行采样的MCMC方案相结合,以使用基因组规模的混合遗传变异数据来识别群体分配,并学习这些群体的分化时间和混合比例。我们使用合并模拟以及一组真实的牛和人类变异数据对我们的方法进行了验证。在模拟序列上,我们的方法在估计混合比例和分化时间方面比领先的竞争方法具有更高的准确性和更快的运行时间。对真实数据的分析进一步表明,我们的方法在匹配我们目前对相关群体的最佳了解方面是有效的。