Suppr超能文献

利用近似贝叶斯计算对 SNP 芯片数据进行抽样建模,以进行人口推断。

Modeling SNP array ascertainment with Approximate Bayesian Computation for demographic inference.

机构信息

National Laboratory of Genomics for Biodiversity (LANGEBIO), CINVESTAV, Irapuato, 36821, Mexico, Mexico.

Center for Human Identification, University of North Texas Health Science Center, Texas, 76107, USA.

出版信息

Sci Rep. 2018 Jul 5;8(1):10209. doi: 10.1038/s41598-018-28539-y.

Abstract

Single nucleotide polymorphisms (SNPs) in commercial arrays have often been discovered in a small number of samples from selected populations. This ascertainment skews patterns of nucleotide diversity and affects population genetic inferences. We propose a demographic inference pipeline that explicitly models the SNP discovery protocol in an Approximate Bayesian Computation (ABC) framework. We simulated genomic regions according to a demographic model incorporating parameters for the divergence of three well-characterized HapMap populations and recreated the SNP distribution of a commercial array by varying the number of haploid samples and the allele frequency cut-off in the given regions. We then calculated summary statistics obtained from both the ascertained and genomic data and inferred ascertainment and demographic parameters. We implemented our pipeline to study the admixture process that gave rise to the present-day Mexican population. Our estimate of the time of admixture is closer to the historical dates than those in previous works which did not consider ascertainment bias. Although the use of whole genome sequences for demographic inference is becoming the norm, there are still underrepresented areas of the world from where only SNP array data are available. Our inference framework is applicable to those cases and will help with the demographic inference.

摘要

单核苷酸多态性 (SNP) 在商业芯片上的发现通常是在来自特定人群的少数样本中进行的。这种确定方法会扭曲核苷酸多样性的模式,并影响群体遗传推断。我们提出了一种人口统计推断管道,该管道在近似贝叶斯计算 (ABC) 框架中明确地对 SNP 发现方案进行建模。我们根据包含三个特征明确的 HapMap 人群分歧参数的人口统计模型来模拟基因组区域,并通过改变给定区域中的单倍体样本数量和等位基因频率截止值来重新创建商业芯片上的 SNP 分布。然后,我们计算了来自确定和基因组数据的综合统计数据,并推断了确定和人口统计参数。我们实施了我们的管道来研究导致当今墨西哥人口形成的混合过程。我们对混合时间的估计比以前没有考虑确定偏差的研究更接近历史日期。尽管使用全基因组序列进行人口统计推断已成为常态,但仍有世界上代表性不足的地区,仅提供 SNP 芯片数据。我们的推断框架适用于这些情况,并将有助于人口统计推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cdef/6033855/2f17303880cd/41598_2018_28539_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验