Suppr超能文献

一种用于基于基因组预测数量性状遗传值的非参数混合模型。

A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait.

作者信息

Gianola Daniel, Wu Xiao-Lin, Manfredi Eduardo, Simianer Henner

机构信息

Department of Animal Sciences and Department of Dairy Science, University of Wisconsin-Madison, 1675 Observatory Dr, Madison, WI 53706, USA.

出版信息

Genetica. 2010 Oct;138(9-10):959-77. doi: 10.1007/s10709-010-9478-4. Epub 2010 Aug 25.

Abstract

A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some "baseline" family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.

摘要

一种基于狄利克雷过程先验的贝叶斯非参数回归形式被应用于分析可能受隐性基因作用形式影响的数量性状,以及单核苷酸多态性(SNP)辅助基因组选择的背景下,其主要目标是预测表型上的基因组信号。该程序将未知基因型聚类为具有不同遗传值的组,但在聚类数量先验未知的情况下,因此有限混合分析的标准方法不起作用。核心假设是遗传效应遵循某个“基线”家族的未知分布,在此处考虑的情况下这是一个正态过程。基于吉布斯采样器的贝叶斯分析产生聚类数量的估计、遗传效应的后验均值、基线分布可信度的度量以及后者参数的估计。该程序通过一个代表两个群体的模拟进行说明。在第一个群体中,有3个未知的数量性状基因座(QTL),具有加性、显性和上位性效应;在第二个群体中,有10个QTL,具有加性、显性和加性×加性上位性效应。在这两个群体中,基线参数被正确推断。狄利克雷过程模型在第一个群体中正确推断出独特遗传值的数量,但在第二个群体中低估了该数量;在这里,聚类的真实数量超过900,而该模型给出的后验均值估计约为140,可能是因为需要更多的基因型重复来进行正确推断。通过对192个肉鸡父系半同胞家系的平均体重分析,研究了一个关键参数(M)的先验分布以及重复程度对推断的影响,其中每个父本对近7000个SNP进行了基因分型。在这个小样本中,发现关于聚类数量的推断受M的先验分布影响。对于给定先验分布的一组参数组合,当每个基因型的重复样本数量增加时,先验的影响会消散。因此,狄利克雷过程模型似乎对于衡量影响该性状的QTL数量很有用:如果推断出的聚类数量少,可能只有少数QTL编码该性状。如果推断出的聚类数量多,这可能意味着基于基线分布的标准参数模型可能就足够了。然而,先验可能有影响,特别是如果样本量不大且样本中只有少数基因型配置有重复表型时。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验