在低深度 NGS 数据中推断群体结构和混合比例。

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.

机构信息

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200, Denmark

The Bioinformatics Centre, Department of Biology, University of Copenhagen, DK-2200, Denmark.

出版信息

Genetics. 2018 Oct;210(2):719-731. doi: 10.1534/genetics.118.301336. Epub 2018 Aug 21.

DOI:10.1534/genetics.118.301336

PMID:30131346

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6216594/

Abstract

We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.

摘要

我们在这里介绍了两种用于推断低深度下一代测序（NGS）数据中的群体结构和混合比例的方法。在群体遗传学和关联研究中，群体结构的推断是必不可少的，通常使用主成分分析（PCA）或基于聚类的方法来进行。NGS 方法提供了大量的遗传数据，但与统计不确定性有关，特别是对于低深度测序数据。模型可以通过直接处理未观察到的基因型的基因型似然来解释这种不确定性。我们提出了一种通过 PCA 进行群体结构推断的方法，该方法采用迭代启发式方法来估计个体等位基因频率，我们在模拟和真实数据集的低测序深度和可变测序深度的样本中展示了改进的准确性。我们还使用估计的个体等位基因频率在快速非负矩阵分解方法中估计混合比例。这两种方法都已在 PCAngsd 框架中实现，可在 http://www.popgen.dk/software/ 上获得。

相似文献

Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.在低深度 NGS 数据中推断群体结构和混合比例。

Genetics. 2018 Oct;210(2):719-731. doi: 10.1534/genetics.118.301336. Epub 2018 Aug 21.

Estimating individual admixture proportions from next generation sequencing data.从下一代测序数据估计个体混合比例。

Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11.

fastNGSadmix: admixture proportions and principal component analysis of a single NGS sample.fastNGSadmix：单个 NGS 样本的混合比例和主成分分析。

Bioinformatics. 2017 Oct 1;33(19):3148-3150. doi: 10.1093/bioinformatics/btx474.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。

BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

Robust inference of population structure from next-generation sequencing data with systematic differences in sequencing.有系统测序差异的下一代测序数据中群体结构的稳健推断

Bioinformatics. 2018 Apr 1;34(7):1157-1163. doi: 10.1093/bioinformatics/btx708.

Fast admixture analysis and population tree estimation for SNP and NGS data.快速混合分析和 SNP 及 NGS 数据的群体树估计。

Bioinformatics. 2017 Jul 15;33(14):2148-2155. doi: 10.1093/bioinformatics/btx098.

Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data.无基因型估计等位基因频率可减少偏差并提高 RADSeq 数据的种群遗传推断准确性。

Mol Ecol Resour. 2019 May;19(3):586-596. doi: 10.1111/1755-0998.12990. Epub 2019 Apr 17.

Testing for Hardy-Weinberg equilibrium in structured populations using genotype or low-depth next generation sequencing data.使用基因型或低深度下一代测序数据检测结构群体中的哈迪-温伯格平衡。

Mol Ecol Resour. 2019 Sep;19(5):1144-1152. doi: 10.1111/1755-0998.13019. Epub 2019 Jun 12.

NGSremix: a software tool for estimating pairwise relatedness between admixed individuals from next-generation sequencing data.NGSremix：一种用于从下一代测序数据估算混合个体之间成对亲缘关系的软件工具。

G3 (Bethesda). 2021 Aug 7;11(8). doi: 10.1093/g3journal/jkab174.

NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data.NgsRelate：一种用于从下一代测序数据估计成对亲缘关系的软件工具。

Bioinformatics. 2015 Dec 15;31(24):4009-11. doi: 10.1093/bioinformatics/btv509. Epub 2015 Aug 30.

引用本文的文献

Ancient genomes provide evidence of demographic shift to Slavic-associated groups in Moravia.古代基因组为摩拉维亚地区向与斯拉夫人相关群体的人口结构转变提供了证据。

Genome Biol. 2025 Sep 3;26(1):259. doi: 10.1186/s13059-025-03700-9.

Gone With the Wind: Exploring a Vanished Rock Dove, , Hybrid Zone in the Sahara Desert.《随风而逝：探寻撒哈拉沙漠中一个消失的原鸽杂交区》

Ecol Evol. 2025 Aug 27;15(9):e72061. doi: 10.1002/ece3.72061. eCollection 2025 Sep.

Bone Adhered Sediments as a Source of Target and Environmental DNA and Proteins.作为目标DNA、环境DNA和蛋白质来源的骨附着沉积物

Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf202.

Benchmarking of low coverage sequencing workflows for precision genotyping in eggplant.茄子中用于精准基因分型的低覆盖度测序工作流程的基准测试

BMC Plant Biol. 2025 Aug 25;25(1):1125. doi: 10.1186/s12870-025-07242-x.

Genome sequence analysis provides evidence that a boreal crustacean colonised Svalbard well before the ongoing Atlantification of the Arctic.基因组序列分析提供的证据表明，一种北方甲壳类动物在北极地区当前正在进行的大西洋化之前很久就已在斯瓦尔巴群岛定居。

Heredity (Edinb). 2025 Aug 23. doi: 10.1038/s41437-025-00793-7.

Late Iron Age and Roman equine breeding north of the Alps: Genetic insights and cultural implications.阿尔卑斯山以北铁器时代晚期和罗马时期的马匹繁育：遗传学见解与文化影响

iScience. 2025 Aug 13;28(9):113224. doi: 10.1016/j.isci.2025.113224. eCollection 2025 Sep 19.

Resequencing and phenotyping of the first highly inbred eggplant multiparent population reveal as a key gene associated with root morphology.首个高度自交的茄子多亲本群体的重测序和表型分析揭示了一个与根系形态相关的关键基因。

Hortic Res. 2025 Jun 26;12(9):uhaf167. doi: 10.1093/hr/uhaf167. eCollection 2025 Sep.

The Genomic Basis of the Svalbard Reindeer's Adaptation to an Extreme Arctic Environment.斯瓦尔巴德驯鹿适应极端北极环境的基因组基础。

Genome Biol Evol. 2025 Sep 2;17(9). doi: 10.1093/gbe/evaf160.

Genomic analyses support locally derived crown-of-thorns seastar outbreaks in the Pacific.基因组分析支持太平洋地区源自当地的棘冠海星爆发。

BMC Biol. 2025 Aug 6;23(1):244. doi: 10.1186/s12915-025-02350-4.

Comparative Phylogeography of West African Rainforest Frogs Reveals Regional Variation in Refugia Dynamics.西非雨林蛙类的比较系统地理学揭示了避难所动态的区域差异。

Mol Ecol. 2025 Sep;34(17):e70043. doi: 10.1111/mec.70043. Epub 2025 Jul 30.

本文引用的文献

Inferring Heterozygosity from Ancient and Low Coverage Genomes.从古代低覆盖度基因组推断杂合性

Genetics. 2017 Jan;205(1):317-332. doi: 10.1534/genetics.116.189985. Epub 2016 Nov 7.

pcadapt: an R package to perform genome scans for selection based on principal component analysis.pcadapt：一个基于主成分分析进行选择的基因组扫描的R软件包。

Mol Ecol Resour. 2017 Jan;17(1):67-77. doi: 10.1111/1755-0998.12592. Epub 2016 Sep 7.

Fast Principal-Component Analysis Reveals Convergent Evolution of ADH1B in Europe and East Asia.快速主成分分析揭示了乙醇脱氢酶1B在欧洲和东亚的趋同进化。

Am J Hum Genet. 2016 Mar 3;98(3):456-472. doi: 10.1016/j.ajhg.2015.12.022. Epub 2016 Feb 25.

Model-free Estimation of Recent Genetic Relatedness.近期遗传相关性的无模型估计

Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.

Probabilistic models of genetic variation in structured populations applied to global human studies.应用于全球人类研究的结构化群体中基因变异的概率模型。

Bioinformatics. 2016 Mar 1;32(5):713-21. doi: 10.1093/bioinformatics/btv641. Epub 2015 Nov 6.

ANGSD: Analysis of Next Generation Sequencing Data.ANGSD：下一代测序数据分析

BMC Bioinformatics. 2014 Nov 25;15(1):356. doi: 10.1186/s12859-014-0356-4.

Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。

Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.

ngsTools: methods for population genetics analyses from next-generation sequencing data.ngsTools：从下一代测序数据中进行群体遗传学分析的方法。

Bioinformatics. 2014 May 15;30(10):1486-7. doi: 10.1093/bioinformatics/btu041. Epub 2014 Jan 23.

Estimating individual admixture proportions from next generation sequencing data.从下一代测序数据估计个体混合比例。

Genetics. 2013 Nov;195(3):693-702. doi: 10.1534/genetics.113.154138. Epub 2013 Sep 11.

Quantifying population genetic differentiation from next-generation sequencing data.从下一代测序数据中定量群体遗传分化。

Genetics. 2013 Nov;195(3):979-92. doi: 10.1534/genetics.113.154740. Epub 2013 Aug 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验