Suppr超能文献

用于分析多位点、多群体样本中SNP多样性的贝叶斯层次模型。

A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation samples.

作者信息

Guo Feng, Dey Dipak K, Holsinger Kent E

机构信息

Feng Guo is Assistant Professor of Statistics, Department of Statistics, Virginia Tech, Blacksburg, VA 24061 (email:

出版信息

J Am Stat Assoc. 2009 Mar 1;104(485):142-154. doi: 10.1198/jasa.2009.0010.

Abstract

The distribution of genetic variation among populations is conveniently measured by Wright's F(ST), which is a scaled variance taking on values in [0,1]. For certain types of genetic markers, and for single-nucleotide polymorphisms (SNPs) in particular, it is reasonable to presume that allelic differences at most loci are selectively neutral. For such loci, the distribution of genetic variation among populations is determined by the size of local populations, the pattern and rate of migration among those populations, and the rate of mutation. Because the demographic parameters (population sizes and migration rates) are common across all autosomal loci, locus-specific estimates of F(ST) will depart from a common distribution only for loci with unusually high or low rates of mutation or for loci that are closely associated with genomic regions having a relationship with fitness. Thus, loci that are statistical outliers showing significantly more among-population differentiation than others may mark genomic regions subject to diversifying selection among the sample populations. Similarly, statistical outliers showing significantly less differentiation among populations than others may mark genomic regions subject to stabilizing selection across the sample populations. We propose several Bayesian hierarchical models to estimate locus-specific effects on F(ST), and we apply these models to single nucleotide polymorphism data from the HapMap project. Because loci that are physically associated with one another are likely to show similar patterns of variation, we introduce conditional autoregressive models to incorporate the local correlation among loci for high-resolution genomic data. We estimate the posterior distributions of model parameters using Markov chain Monte Carlo (MCMC) simulations. Model comparison using several criteria, including DIC and LPML, reveals that a model with locus- and population-specific effects is superior to other models for the data used in the analysis. To detect statistical outliers we propose an approach that measures divergence between the posterior distributions of locus-specific effects and the common F(ST) with the Kullback-Leibler divergence measure. We calibrate this measure by comparing values with those produced from the divergence between a biased and a fair coin. We conduct a simulation study to illustrate the performance of our approach for detecting loci subject to stabilizing/divergent selection, and we apply the proposed models to low- and high-resolution SNP data from the HapMap project. Model comparison using DIC and LPML reveals that CAR models are superior to alternative models for the high resolution data. For both low and high resolution data, we identify statistical outliers that are associated with known genes.

摘要

群体间遗传变异的分布可以通过赖特的F(ST)方便地测量,F(ST)是一个标度化的方差,取值范围为[0,1]。对于某些类型的遗传标记,特别是单核苷酸多态性(SNP),可以合理地假定大多数位点的等位基因差异是选择性中性的。对于这些位点,群体间遗传变异的分布由当地群体的大小、这些群体间的迁移模式和速率以及突变率决定。由于人口统计学参数(群体大小和迁移率)在所有常染色体位点上是共同的,只有突变率异常高或低的位点,或者与与适应性相关的基因组区域紧密相关的位点,F(ST)的位点特异性估计才会偏离共同分布。因此,作为统计异常值且显示出比其他位点显著更多群体间分化的位点,可能标记了样本群体中受到多样化选择的基因组区域。同样,作为统计异常值且显示出比其他位点显著更少群体间分化的位点,可能标记了样本群体中受到稳定选择的基因组区域。我们提出了几个贝叶斯层次模型来估计对F(ST)的位点特异性效应,并将这些模型应用于国际人类基因组单体型图计划(HapMap计划)的单核苷酸多态性数据。由于彼此物理相关的位点可能显示出相似的变异模式,我们引入条件自回归模型以纳入高分辨率基因组数据中位点间的局部相关性。我们使用马尔可夫链蒙特卡罗(MCMC)模拟估计模型参数的后验分布。使用包括DIC和LPML在内的几个标准进行模型比较,结果表明对于分析中使用的数据,具有位点和群体特异性效应的模型优于其他模型。为了检测统计异常值,我们提出了一种方法,该方法使用库尔贝克-莱布勒散度度量来测量位点特异性效应的后验分布与共同的F(ST)之间的差异。我们通过将值与由有偏硬币和公平硬币之间的差异产生的值进行比较来校准此度量。我们进行了一项模拟研究,以说明我们检测受到稳定/分化选择的位点的方法的性能,并将所提出的模型应用于HapMap计划的低分辨率和高分辨率SNP数据。使用DIC和LPML进行模型比较表明,对于高分辨率数据,CAR模型优于替代模型。对于低分辨率和高分辨率数据,我们都识别出了与已知基因相关的统计异常值。

相似文献

9
Detecting and measuring selection from gene frequency data.从基因频率数据中检测和衡量选择。
Genetics. 2014 Mar;196(3):799-817. doi: 10.1534/genetics.113.152991. Epub 2013 Dec 20.

引用本文的文献

1
Detecting Selection from Linked Sites Using an -Model.利用 - 模型从关联站点检测选择。
Genetics. 2020 Dec;216(4):1205-1215. doi: 10.1534/genetics.120.303780. Epub 2020 Oct 16.
2
SNP variable selection by generalized graph domination.基于广义图控制的 SNP 变量选择。
PLoS One. 2019 Jan 24;14(1):e0203242. doi: 10.1371/journal.pone.0203242. eCollection 2019.
5
Detecting and measuring selection from gene frequency data.从基因频率数据中检测和衡量选择。
Genetics. 2014 Mar;196(3):799-817. doi: 10.1534/genetics.113.152991. Epub 2013 Dec 20.
7
Genomics of isolation in hybrids.杂种隔离的基因组学。
Philos Trans R Soc Lond B Biol Sci. 2012 Feb 5;367(1587):439-50. doi: 10.1098/rstb.2011.0196.
8
A hierarchical Bayesian model for next-generation population genomics.下一代群体基因组学的分层贝叶斯模型。
Genetics. 2011 Mar;187(3):903-17. doi: 10.1534/genetics.110.124693. Epub 2011 Jan 6.

本文引用的文献

1
ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.估计用于群体结构分析的F统计量
Evolution. 1984 Nov;38(6):1358-1370. doi: 10.1111/j.1558-5646.1984.tb05657.x.
2
The genetical structure of populations.种群的遗传结构。
Ann Eugen. 1951 Mar;15(4):323-54. doi: 10.1111/j.1469-1809.1949.tb02451.x.
4
Evolution in Mendelian Populations.孟德尔群体中的进化。
Genetics. 1931 Mar;16(2):97-159. doi: 10.1093/genetics/16.2.97.
6
Molecular signatures of natural selection.自然选择的分子特征。
Annu Rev Genet. 2005;39:197-218. doi: 10.1146/annurev.genet.39.073003.112420.
7
A haplotype map of the human genome.人类基因组单倍型图谱。
Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验