Suppr超能文献

在低深度 NGS 数据中推断群体结构和混合比例。

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.

机构信息

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200, Denmark

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200, Denmark.

出版信息

Genetics. 2018 Oct;210(2):719-731. doi: 10.1534/genetics.118.301336. Epub 2018 Aug 21.

Abstract

We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

摘要

我们在这里介绍了两种用于推断低深度下一代测序(NGS)数据中的群体结构和混合比例的方法。在群体遗传学和关联研究中,群体结构的推断是必不可少的,通常使用主成分分析(PCA)或基于聚类的方法来进行。NGS 方法提供了大量的遗传数据,但与统计不确定性有关,特别是对于低深度测序数据。模型可以通过直接处理未观察到的基因型的基因型似然来解释这种不确定性。我们提出了一种通过 PCA 进行群体结构推断的方法,该方法采用迭代启发式方法来估计个体等位基因频率,我们在模拟和真实数据集的低测序深度和可变测序深度的样本中展示了改进的准确性。我们还使用估计的个体等位基因频率在快速非负矩阵分解方法中估计混合比例。这两种方法都已在 PCAngsd 框架中实现,可在 http://www.popgen.dk/software/ 上获得。

相似文献

引用本文的文献

本文引用的文献

1
Inferring Heterozygosity from Ancient and Low Coverage Genomes.从古代低覆盖度基因组推断杂合性
Genetics. 2017 Jan;205(1):317-332. doi: 10.1534/genetics.116.189985. Epub 2016 Nov 7.
4
Model-free Estimation of Recent Genetic Relatedness.近期遗传相关性的无模型估计
Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.
6
ANGSD: Analysis of Next Generation Sequencing Data.ANGSD:下一代测序数据分析
BMC Bioinformatics. 2014 Nov 25;15(1):356. doi: 10.1186/s12859-014-0356-4.
7
Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。
Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验