Thorsrud Joseph A, Evans Katy M, Quigley Kyle C, Srikanth Krishnamoorthy, Huson Heather J
Department of Animal Sciences, College of Agriculture and Life Sciences, Cornell University, 201 Morrison Hall, 507 Tower Road, Ithaca, NY 14853, USA.
The Seeing Eye Inc., 1 Seeing Eye Wy, Morristown, NJ 07960, USA.
Animals (Basel). 2025 Feb 2;15(3):408. doi: 10.3390/ani15030408.
This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds ( = 482), Golden Retrievers ( = 239), Labrador Retrievers ( = 1188), and Labrador and Golden Retriever crosses ( = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.
本研究调查了各种基因组预测模型——基因组最佳线性无偏预测(GBLUP)、随机森林(RF)、支持向量机(SVM)、极限梯度提升(XGB)和多层感知器(MLP)——在预测基因组育种值(gEBV)方面的功效。表型数据包括导盲犬群体中的三个二元健康性状(无牙症、双行睫、口腔乳头瘤病)和一个行为性状(注意力分散)。这些性状影响导盲犬成功的潜力,因此通常会进行特征描述,但基于遗传力和病例数的差异进行选择,专门用于评估gEBV模型性能。利用来自导盲犬组织的数据集,其中包括德国牧羊犬(n = 482)、金毛寻回犬(n = 239)、拉布拉多寻回犬(n = 1188)以及拉布拉多和金毛寻回犬的杂交品种(n = 111),我们评估了不同品种、性状遗传力、病例数和单核苷酸多态性(SNP)标记密度范围内及之间的模型性能。我们的结果表明,在不同遗传力、病例数或SNP密度下,模型性能未发现显著差异,所有模型表现相似。鉴于其无需参数优化,GBLUP是最有效的模型。双行睫显示出最高的总体预测性能,可能是由于其较高的遗传力,而无牙症和注意力分散表现出中等准确性,口腔乳头瘤病的准确性最低,这与其低遗传力相关。这些发现强调,低密度SNP数据集可以有效地构建gEBV,这表明高成本、高密度基因分型可能并非总是必要的。此外,所有模型的相似性能表明,像GBLUP这样需要较少微调的更简单模型,可能足以用于犬类育种计划中的基因组预测。该研究强调了标准化表型评估和精心构建参考群体对于优化犬类育种计划中基因组选择效用的重要性。