Cardoso F F, Rosa G J M, Tempelman R J
EMBRAPA Pecuária Sul (Brazilian Agricultural Research Corporation South Cattle and Sheep Center), Bagé, RS 96401-970.
J Anim Sci. 2007 Apr;85(4):909-18. doi: 10.2527/jas.2006-668. Epub 2006 Dec 18.
The objectives of this study were to demonstrate the utility of hierarchical Bayesian models combining residual heteroskedasticity with robustness for outlier detection and muting and to evaluate the effects of such joint modeling in multibreed genetic evaluations. A 3 x 2 factorial specification of 6 residual variance models based on several distributional (Gaussian, Student's t, or Slash) and variability (homoskedastic or heteroskedastic) assumptions was used to analyze 22,717 postweaning gain records from a Nelore-Hereford population (40,082 animals in the pedigree). To illustrate the utility of the 2 robust distributional specifications (Student's t and Slash) for outlier detection and muting, 3 records from the same contemporary group (an extreme residual outlier, a mild residual outlier, and a near-zero residual) were chosen for further study. The posterior densities of the corresponding weighting variables of these records were used to assess their degree of Gaussian outlyingness and the ability of the robust models to mute the effects of deviant records. The Student's t heteroskedastic provided the best-fit model among the 6 specifications and was preferred for genetic merit inference. Kendall rank correlations of the posterior means of the additive genetic effects of the animals, used to compare the selection order of the Student's t and Gaussian models, were reasonably high across all animals within the most frequent genotypes, ranging from 0.83 to 0.91 and from 0.89 to 0.95 for the homoskedastic and the heteroskedastic versions, respectively. However, when considering only animals ranked in the top 10% by the customary Gaussian homoskedastic model, these rank correlations were reduced considerably, ranging from 0.29 to 0.57 and from 0.72 to 0.85 between the 2 residual densities within the homoskedastic and heteroskedastic versions, respectively. Rank correlations between the homoskedastic and heteroskedastic versions within each of the Gaussian and Student's t error models tended to be smaller, with a range from 0.68 to 0.90 across all animals and from 0.28 to 0.67 for animals ranked in the top 10%. These results support the implementation of robust models accounting for sources of heteroskedasticity to increase the precision and stability of multibreed genetic evaluations with proper statistical treatment of deviant records.
本研究的目的是证明将残差异方差与稳健性相结合的分层贝叶斯模型在异常值检测和抑制方面的效用,并评估这种联合建模在多品种遗传评估中的效果。基于几种分布(高斯分布、学生t分布或斜线分布)和变异性(同方差或异方差)假设,使用了一个3×2析因设计的6种残差方差模型,来分析内洛尔牛-赫里福德牛群体的22717条断奶后增重记录(系谱中有40082头动物)。为了说明两种稳健分布规范(学生t分布和斜线分布)在异常值检测和抑制方面的效用,从同一当代组中选择了3条记录(一个极端残差异常值、一个轻度残差异常值和一个接近零的残差)进行进一步研究。这些记录相应加权变量的后验密度用于评估它们的高斯异常程度以及稳健模型抑制异常记录影响的能力。在6种规范中,学生t异方差提供了最佳拟合模型,并且在遗传价值推断中更受青睐。用于比较学生t模型和高斯模型选择顺序的动物加性遗传效应后验均值的肯德尔等级相关性,在最常见基因型的所有动物中相当高,同方差和异方差版本分别从0.83到0.91以及从0.89到0.95。然而,当仅考虑按传统高斯同方差模型排名在前10%的动物时,这些等级相关性大幅降低,同方差和异方差版本内两种残差密度之间分别从0.29到0.57以及从0.72到0.85。高斯误差模型和学生t误差模型各自的同方差和异方差版本之间的等级相关性往往较小,在所有动物中范围为0.68到0.90,在排名在前10%的动物中为0.28到0.67。这些结果支持实施考虑异方差来源的稳健模型,以便通过对异常记录进行适当的统计处理来提高多品种遗传评估的精度和稳定性。