Casellas Joaquim, Ibáñez-Escriche Noelia, García-Cortés Luis Alberto, Varona Luis
Genètica i Millora Animal, IRTA-Lleida, 25198 Lleida, Spain.
Genet Sel Evol. 2008 Jul-Aug;40(4):395-413. doi: 10.1186/1297-9686-40-4-395. Epub 2008 Jun 17.
The implementation of Student t mixed models in animal breeding has been suggested as a useful statistical tool to effectively mute the impact of preferential treatment or other sources of outliers in field data. Nevertheless, these additional sources of variation are undeclared and we do not know whether a Student t mixed model is required or if a standard, and less parameterized, Gaussian mixed model would be sufficient to serve the intended purpose. Within this context, our aim was to develop the Bayes factor between two nested models that only differed in a bounded variable in order to easily compare a Student t and a Gaussian mixed model. It is important to highlight that the Student t density converges to a Gaussian process when degrees of freedom tend to infinity. The two models can then be viewed as nested models that differ in terms of degrees of freedom. The Bayes factor can be easily calculated from the output of a Markov chain Monte Carlo sampling of the complex model (Student t mixed model). The performance of this Bayes factor was tested under simulation and on a real dataset, using the deviation information criterion (DIC) as the standard reference criterion. The two statistical tools showed similar trends along the parameter space, although the Bayes factor appeared to be the more conservative. There was considerable evidence favoring the Student t mixed model for data sets simulated under Student t processes with limited degrees of freedom, and moderate advantages associated with using the Gaussian mixed model when working with datasets simulated with 50 or more degrees of freedom. For the analysis of real data (weight of Pietrain pigs at six months), both the Bayes factor and DIC slightly favored the Student t mixed model, with there being a reduced incidence of outlier individuals in this population.
学生t混合模型在动物育种中的应用被认为是一种有用的统计工具,可有效减弱现场数据中优惠待遇或其他异常值来源的影响。然而,这些额外的变异来源并未声明,我们也不知道是否需要学生t混合模型,或者一个标准的、参数化较少的高斯混合模型是否足以达到预期目的。在此背景下,我们的目标是开发两个嵌套模型之间的贝叶斯因子,这两个模型仅在一个有界变量上有所不同,以便轻松比较学生t混合模型和高斯混合模型。需要强调的是,当自由度趋于无穷大时,学生t密度收敛于高斯过程。然后可以将这两个模型视为在自由度方面有所不同的嵌套模型。贝叶斯因子可以根据复杂模型(学生t混合模型)的马尔可夫链蒙特卡罗采样输出轻松计算得出。使用偏差信息准则(DIC)作为标准参考准则,在模拟和真实数据集上测试了该贝叶斯因子的性能。这两种统计工具在参数空间中显示出相似的趋势,尽管贝叶斯因子似乎更为保守。有大量证据支持在自由度有限的学生t过程下模拟的数据集使用学生t混合模型,而在自由度为50或更多的数据集上使用高斯混合模型有适度优势。对于实际数据(皮特兰猪六个月时的体重)分析,贝叶斯因子和DIC都略微倾向于学生t混合模型,该群体中异常个体的发生率有所降低。