Naderi S, Bohlouli M, Yin T, König S
Institute of Animal Breeding and Genetics, University of Gießen, Ludwigstr. 21b, 35390, Gießen, Germany.
Anim Genet. 2018 Jun;49(3):178-192. doi: 10.1111/age.12661. Epub 2018 Apr 6.
Holstein Friesian cow training sets were created according to disease incidences. The different datasets were used to investigate the impact of random forest (RF) and genomic BLUP (GBLUP) methodology on genomic prediction accuracies. In addition, for further verifications of some specific scenarios, single-step genomic BLUP was applied. Disease traits included the overall trait categories of (i) claw disorders, (ii) clinical mastitis and (iii) infertility from 80 741 first lactation Holstein cows kept in 58 large-scale herds. A subset of 6744 cows was genotyped (50K SNP panel). Response variables for all scenarios were de-regressed proofs (DRPs) and pre-corrected phenotypes (PCPs). Initially, all sick cows were allocated to the testing set, and healthy cows represented the training set. For the ongoing cow allocation schemes, the number of sick cows in the training set increased stepwise by moving 10% of the sick cows from the testing to the training set in each step. The size of training and testing sets was kept constant by replacing the same number of cows in the testing set with (randomly selected) healthy cows from the training set. For both the RF and GBLUP methods, prediction accuracies were larger for DRPs compared to PCPs. For PCPs as a response variable, the largest prediction accuracies were observed when the disease incidences in training sets reflected the disease incidence in the whole population. A further increase in prediction accuracies for some selected cow allocation schemes (i.e. larger prediction accuracies compared to corresponding scenarios with RF or GBLUB) was achieved via single-step GBLUP applications. Correlations between genome-wide association study SNP effects and RF importance criteria for single SNPs were in a moderate range, from 0.42 to 0.57, when considering SNPs from all chromosomes or from specific chromosome segments. RF identified significant SNPs close to potential positional candidate genes: GAS1, GPAT3 and CYP2R1 for clinical mastitis; SPINK5 and SLC26A2 for laminitis; and FGF12 for endometritis.
根据疾病发病率创建了荷斯坦奶牛训练集。使用不同的数据集来研究随机森林(RF)和基因组最佳线性无偏预测(GBLUP)方法对基因组预测准确性的影响。此外,为了进一步验证某些特定情况,应用了单步基因组GBLUP。疾病性状包括(i)蹄病、(ii)临床型乳房炎和(iii)不育症等总体性状类别,这些数据来自于58个大型牛群中饲养的80741头头胎荷斯坦奶牛。对6744头奶牛的一个子集进行了基因分型(50K SNP芯片)。所有情况下的响应变量均为去回归证明(DRP)和预校正表型(PCP)。最初,所有患病奶牛被分配到测试集,健康奶牛代表训练集。对于正在进行的奶牛分配方案,训练集中患病奶牛的数量通过在每一步将10%的患病奶牛从测试集转移到训练集而逐步增加。通过从训练集中(随机选择)健康奶牛替换测试集中相同数量的奶牛,使训练集和测试集的大小保持不变。对于RF和GBLUP方法,与PCP相比,DRP的预测准确性更高。对于作为响应变量的PCP,当训练集中的疾病发病率反映整个人口中的疾病发病率时,观察到最大的预测准确性。通过单步GBLUP应用,在一些选定的奶牛分配方案中实现了预测准确性的进一步提高(即与使用RF或GBLUB的相应情况相比,预测准确性更高)。当考虑来自所有染色体或特定染色体片段的单核苷酸多态性(SNP)时,全基因组关联研究SNP效应与单个SNP的RF重要性标准之间的相关性处于中等范围,从0.42到0.57。RF识别出了靠近潜在位置候选基因的显著SNP:临床型乳房炎相关的GAS1、GPAT3和CYP2R1;蹄叶炎相关的SPINK5和SLC26A2;以及子宫内膜炎相关的FGF12。