The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200 Copenhagen N.
Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11.
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.
推断群体结构和个体祖先对于群体遗传学和关联研究都很重要。随着下一代测序技术的发展,有可能获得基因组中所有可及遗传变异的遗传数据。现有的混合分析方法依赖于已知的基因型。然而,如果不引入错误,从低深度测序数据中无法推断个体基因型。本文提出了一种新的方法,可以从下一个考虑到不确定性世代测序数据中推断个体的祖先直接处理包含未观察到基因型所有相关信息的基因型似然。使用模拟以及公开可用的测序数据,我们证明了即使在深度非常低的数据情况下,所提出的方法也具有很高的准确性。同时,我们证明了将现有方法应用于从同一数据中调用的基因型可能会引入严重的偏差。所提出的方法在可从 http://www.popgen.dk/software 获得的 NGSadmix 软件中实现。