Gianola Daniel, Simianer Henner, Qanbari Saber
Department of Animal Sciences and Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
Genet Res (Camb). 2010 Apr;92(2):141-55. doi: 10.1017/S0016672310000121.
A two-step procedure is presented for analysis of theta (FST) statistics obtained for a battery of loci, which eventually leads to a clustered structure of values. The first step uses a simple Bayesian model for drawing samples from posterior distributions of theta-parameters, but without constructing Markov chains. This step assigns a weakly informative prior to allelic frequencies and does not make any assumptions about evolutionary models. The second step regards samples from these posterior distributions as 'data' and fits a sequence of finite mixture models, with the aim of identifying clusters of theta-statistics. Hopefully, these would reflect different types of processes and would assist in interpreting results. Procedures are illustrated with hypothetical data, and with published allelic frequency data for type II diabetes in three human populations, and for 12 isozyme loci in 12 populations of the argan tree in Morocco.
本文提出了一种两步法,用于分析一组基因座的θ(FST)统计量,最终得到值的聚类结构。第一步使用一个简单的贝叶斯模型从θ参数的后验分布中抽样,但不构建马尔可夫链。这一步为等位基因频率赋予一个弱信息先验,并且不对进化模型做任何假设。第二步将这些后验分布的样本视为“数据”,并拟合一系列有限混合模型,目的是识别θ统计量的聚类。有望这些聚类能反映不同类型的过程,并有助于解释结果。文中用假设数据以及已发表的三个人类群体中II型糖尿病的等位基因频率数据,还有摩洛哥12个阿甘树群体中12个同工酶基因座的数据对方法进行了说明。