Shariati M M, Korsgaard I R, Sorensen D
Department of Genetics and Biotechnology, Faculty of Agricultural Sciences, University of Aarhus, Tjele, Denmark.
J Anim Breed Genet. 2009 Apr;126(2):92-102. doi: 10.1111/j.1439-0388.2008.00773.x.
Markov chain Monte Carlo (MCMC) enables fitting complex hierarchical models that may adequately reflect the process of data generation. Some of these models may contain more parameters than can be uniquely inferred from the distribution of the data, causing non-identifiability. The reaction norm model with unknown covariates (RNUC) is a model in which unknown environmental effects can be inferred jointly with the remaining parameters. The problem of identifiability of parameters at the level of the likelihood and the associated behaviour of MCMC chains were discussed using the RNUC as an example. It was shown theoretically that when environmental effects (covariates) are considered as random effects, estimable functions of the fixed effects, (co)variance components and genetic effects are identifiable as well as the environmental effects. When the environmental effects are treated as fixed and there are other fixed factors in the model, the contrasts involving environmental effects, the variance of environmental sensitivities (genetic slopes) and the residual variance are the only identifiable parameters. These different identifiability scenarios were generated by changing the formulation of the model and the structure of the data and the models were then implemented via MCMC. The output of MCMC sampling schemes was interpreted in the light of the theoretical findings. The erratic behaviour of the MCMC chains was shown to be associated with identifiability problems in the likelihood, despite propriety of posterior distributions, achieved by arbitrarily chosen uniform (bounded) priors. In some cases, very long chains were needed before the pattern of behaviour of the chain may signal the existence of problems. The paper serves as a warning concerning the implementation of complex models where identifiability problems can be difficult to detect a priori. We conclude that it would be good practice to experiment with a proposed model and to understand its features before embarking on a full MCMC implementation.
马尔可夫链蒙特卡罗(MCMC)方法能够拟合复杂的层次模型,这些模型可能充分反映数据生成过程。其中一些模型包含的参数可能比从数据分布中唯一推断出的参数更多,从而导致不可识别性。具有未知协变量的反应规范模型(RNUC)是一种可以与其余参数联合推断未知环境效应的模型。以RNUC为例,讨论了似然层面参数的可识别性问题以及MCMC链的相关行为。从理论上证明,当环境效应(协变量)被视为随机效应时,固定效应、(协)方差分量和遗传效应的可估计函数以及环境效应都是可识别的。当环境效应被视为固定效应且模型中存在其他固定因素时,涉及环境效应的对比、环境敏感性(遗传斜率)的方差和残差方差是唯一可识别的参数。通过改变模型的公式和数据结构生成了这些不同的可识别性场景,然后通过MCMC实现这些模型。根据理论结果对MCMC抽样方案的输出进行了解释。尽管通过任意选择的均匀(有界)先验实现了后验分布的恰当性,但MCMC链的不稳定行为被证明与似然中的可识别性问题相关。在某些情况下,需要非常长的链,链的行为模式才可能表明存在问题。本文对实施复杂模型发出了警告,在这些模型中,可识别性问题可能很难先验检测到。我们得出结论,在全面实施MCMC之前,对提出的模型进行试验并了解其特征是很好的做法。