POPSTR：基于单核苷酸多态性和拷贝数变异推断混合群体结构

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.

作者信息

Ahn Jaeil, Conkright Brian, Boca Simina M, Madhavan Subha

机构信息

1 Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University , Washington, District of Columbia.

2 Innovation Center for Biomedical Informatics, Georgetown University , Washington, District of Columbia.

出版信息

J Comput Biol. 2018 Apr;25(4):417-429. doi: 10.1089/cmb.2017.0127. Epub 2018 Jan 2.

DOI:10.1089/cmb.2017.0127

PMID:29293371

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5915226/

Abstract

Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.

摘要

用于群体结构估计的统计方法主要由特定的数据类型——单核苷酸多态性（SNP）驱动。然而，在SNP存在弱可识别性的情况下，群体结构估计可能会出现不理想的精度损失。拷贝数变异（CNV）是基因组结构变异，其位点在特定群体中通常是共享的，因此为估计抽样群体的祖先提供了有价值的信息。我们开发了一种SNP和CNV的贝叶斯联合建模框架，称为POPSTR，以比仅使用SNP的方法更好地理解群体结构。为了处理增加的数据量，我们使用了Metropolis调整朗之万算法（MALA），该算法以计算高效的方式引导目标分布。我们使用HapMap 2005项目数据说明了我们方法的应用。我们进行了模拟研究，并表明我们方法的性能与流行的基准方法STRUCTURE和ADMIXTURE相当或更好。我们还观察到，如果没有SNP数据，仅使用CNV可能会非常有效。

相似文献

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.POPSTR：基于单核苷酸多态性和拷贝数变异推断混合群体结构

J Comput Biol. 2018 Apr;25(4):417-429. doi: 10.1089/cmb.2017.0127. Epub 2018 Jan 2.

Copy number variations in the genome of the Qatari population.卡塔尔人群基因组中的拷贝数变异

BMC Genomics. 2015 Oct 22;16:834. doi: 10.1186/s12864-015-1991-5.

Copy number polymorphisms in new HapMap III and Singapore populations.新 HapMap III 和新加坡人群中的拷贝数多态性。

J Hum Genet. 2011 Aug;56(8):552-60. doi: 10.1038/jhg.2011.54. Epub 2011 Jun 16.

Prediction of biogeographical ancestry in admixed individuals.混合个体的生物地理祖籍预测。

Forensic Sci Int Genet. 2018 Sep;36:104-111. doi: 10.1016/j.fsigen.2018.06.013. Epub 2018 Jun 28.

Family-Based Benchmarking of Copy Number Variation Detection Software.基于家族的拷贝数变异检测软件基准测试

PLoS One. 2015 Jul 21;10(7):e0133465. doi: 10.1371/journal.pone.0133465. eCollection 2015.

Analysis of Population-Genetic Properties of Copy Number Variations.拷贝数变异的群体遗传学特性分析

Methods Mol Biol. 2018;1833:179-186. doi: 10.1007/978-1-4939-8666-8_14.

AIM-SNPtag: A computationally efficient approach for developing ancestry-informative SNP panels.AIM-SNPtag：一种用于开发具有遗传背景信息的 SNP 面板的计算高效方法。

Forensic Sci Int Genet. 2019 Jan;38:245-253. doi: 10.1016/j.fsigen.2018.10.015. Epub 2018 Nov 2.

Inference of chromosome-specific copy numbers using population haplotypes.基于群体单体型推断染色体特异性拷贝数。

BMC Bioinformatics. 2011 May 24;12:194. doi: 10.1186/1471-2105-12-194.

fastSTRUCTURE: variational inference of population structure in large SNP data sets.fastSTRUCTURE：大型单核苷酸多态性（SNP）数据集中群体结构的变分推断

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

[DNA polymorphisms].[DNA多态性]

Rinsho Byori. 2013 Nov;61(11):1001-7.

引用本文的文献

Pinpointing the Geographic Origin of 165-Year-Old Human Skeletal Remains Found in Punjab, India: Evidence From Mitochondrial DNA and Stable Isotope Analysis.确定在印度旁遮普邦发现的有165年历史的人类骨骼遗骸的地理来源：来自线粒体DNA和稳定同位素分析的证据

Front Genet. 2022 Apr 28;13:813934. doi: 10.3389/fgene.2022.813934. eCollection 2022.

Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure.充分利用 SNP 阵列：提取潜在基因组结构的工具的系统评价。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac043.

本文引用的文献

fastSTRUCTURE: variational inference of population structure in large SNP data sets.fastSTRUCTURE：大型单核苷酸多态性（SNP）数据集中群体结构的变分推断

Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.

A genetic atlas of human admixture history.人类混合历史的遗传图谱。

Science. 2014 Feb 14;343(6172):747-751. doi: 10.1126/science.1243518.

An overview of STRUCTURE: applications, parameter settings, and supporting software.STRUCTURE 概述：应用、参数设置和支持软件。

Front Genet. 2013 May 29;4:98. doi: 10.3389/fgene.2013.00098. eCollection 2013.

Inferring population size changes with sequence and SNP data: lessons from human bottlenecks.从人类瓶颈事件看利用序列和 SNP 数据推断种群大小变化

Heredity (Edinb). 2013 May;110(5):409-19. doi: 10.1038/hdy.2012.120. Epub 2013 Feb 20.

Copy number variation signature to predict human ancestry.拷贝数变异特征预测人类起源。

BMC Bioinformatics. 2012 Dec 27;13:336. doi: 10.1186/1471-2105-13-336.

Robust estimation of local genetic ancestry in admixed populations using a nonparametric Bayesian approach.基于非参数贝叶斯方法的混合人群中局部遗传血统的稳健估计。

Genetics. 2012 Aug;191(4):1295-308. doi: 10.1534/genetics.112.140228. Epub 2012 May 29.

Population-genetic properties of differentiated human copy-number polymorphisms.人类分化拷贝数多态性的群体遗传特性。

Am J Hum Genet. 2011 Mar 11;88(3):317-32. doi: 10.1016/j.ajhg.2011.02.004.

Origins and functional impact of copy number variation in the human genome.人类基因组中拷贝数变异的起源和功能影响。

Nature. 2010 Apr 1;464(7289):704-12. doi: 10.1038/nature08516. Epub 2009 Oct 7.

Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计

Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

mStruct: inference of population structure in light of both genetic admixing and allele mutations.mStruct：基于遗传混合和等位基因突变推断群体结构。

Genetics. 2009 Jun;182(2):575-93. doi: 10.1534/genetics.108.100222. Epub 2009 Apr 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验