Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark.
Center for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K, Denmark.
Bioinformatics. 2017 Oct 1;33(19):3148-3150. doi: 10.1093/bioinformatics/btx474.
Estimation of admixture proportions and principal component analysis (PCA) are fundamental tools in populations genetics. However, applying these methods to low- or mid-depth sequencing data without taking genotype uncertainty into account can introduce biases.
Here we present fastNGSadmix, a tool to fast and reliably estimate admixture proportions and perform PCA from next generation sequencing data of a single individual. The analyses are based on genotype likelihoods of the input sample and a set of predefined reference populations. The method has high accuracy, even at low sequencing depth and corrects for the biases introduced by small reference populations.
The admixture estimation method is implemented in C ++ and the PCA method is implemented in R. The code is freely available at http://www.popgen.dk/software/index.php/FastNGSadmix.
Supplementary data are available at Bioinformatics online.
混合比例估计和主成分分析(PCA)是群体遗传学的基本工具。然而,如果在不考虑基因型不确定性的情况下将这些方法应用于低深度或中深度测序数据,可能会引入偏差。
这里我们提出了 fastNGSadmix,这是一种从单个个体的下一代测序数据中快速可靠地估计混合比例并进行 PCA 的工具。该分析基于输入样本的基因型可能性和一组预定义的参考群体。该方法即使在测序深度较低的情况下也具有很高的准确性,并纠正了小参考群体引入的偏差。
混合比例估计方法是用 C++ 实现的,PCA 方法是用 R 实现的。代码可在 http://www.popgen.dk/software/index.php/FastNGSadmix 免费获取。
补充数据可在 Bioinformatics 在线获取。