Suppr超能文献

POPSTR:基于单核苷酸多态性和拷贝数变异推断混合群体结构

POPSTR: Inference of Admixed Population Structure Based on Single-Nucleotide Polymorphisms and Copy Number Variations.

作者信息

Ahn Jaeil, Conkright Brian, Boca Simina M, Madhavan Subha

机构信息

1 Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University , Washington, District of Columbia.

2 Innovation Center for Biomedical Informatics, Georgetown University , Washington, District of Columbia.

出版信息

J Comput Biol. 2018 Apr;25(4):417-429. doi: 10.1089/cmb.2017.0127. Epub 2018 Jan 2.

Abstract

Statistical approaches for population structure estimation have been predominantly driven by a particular data type, single-nucleotide polymorphisms (SNPs). However, in the presence of weak identifiability in SNPs, population structure estimation can suffer from undesirable accuracy loss. Copy number variations (CNVs) are genomic structural variants with loci that are commonly shared within a specific population and thus provide valuable information for estimation of the ancestry of sampled populations. We develop a Bayesian joint modeling framework of SNPs and CNVs, called POPSTR, to better understand population structure than approaches that use SNPs solely. To deal with the increased data volume, we use the Metropolis Adjusted Langevin algorithm (MALA) that guides the target distribution in a computationally efficient way. We illustrate applications of our approach using the HapMap 2005 project data. We carry out simulation studies and show that the performance of our approach is comparable or better than that of popular benchmarks, STRUCTURE and ADMIXTURE. We also observe that using only CNVs can be remarkably efficient if SNP data are not available.

摘要

用于群体结构估计的统计方法主要由特定的数据类型——单核苷酸多态性(SNP)驱动。然而,在SNP存在弱可识别性的情况下,群体结构估计可能会出现不理想的精度损失。拷贝数变异(CNV)是基因组结构变异,其位点在特定群体中通常是共享的,因此为估计抽样群体的祖先提供了有价值的信息。我们开发了一种SNP和CNV的贝叶斯联合建模框架,称为POPSTR,以比仅使用SNP的方法更好地理解群体结构。为了处理增加的数据量,我们使用了Metropolis调整朗之万算法(MALA),该算法以计算高效的方式引导目标分布。我们使用HapMap 2005项目数据说明了我们方法的应用。我们进行了模拟研究,并表明我们方法的性能与流行的基准方法STRUCTURE和ADMIXTURE相当或更好。我们还观察到,如果没有SNP数据,仅使用CNV可能会非常有效。

相似文献

4
Prediction of biogeographical ancestry in admixed individuals.混合个体的生物地理祖籍预测。
Forensic Sci Int Genet. 2018 Sep;36:104-111. doi: 10.1016/j.fsigen.2018.06.013. Epub 2018 Jun 28.
5
Family-Based Benchmarking of Copy Number Variation Detection Software.基于家族的拷贝数变异检测软件基准测试
PLoS One. 2015 Jul 21;10(7):e0133465. doi: 10.1371/journal.pone.0133465. eCollection 2015.
10
[DNA polymorphisms].[DNA多态性]
Rinsho Byori. 2013 Nov;61(11):1001-7.

本文引用的文献

2
A genetic atlas of human admixture history.人类混合历史的遗传图谱。
Science. 2014 Feb 14;343(6172):747-751. doi: 10.1126/science.1243518.
5
Copy number variation signature to predict human ancestry.拷贝数变异特征预测人类起源。
BMC Bioinformatics. 2012 Dec 27;13:336. doi: 10.1186/1471-2105-13-336.
9
Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计
Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验