• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在模拟奶牛校准群体中,针对不同疾病发病率和基因组结构的疾病易感性,采用随机森林法估计基因组育种值。

Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups.

作者信息

Naderi S, Yin T, König S

机构信息

Department of Animal Breeding, University of Kassel, 37213 Witzenhausen, Germany.

Department of Animal Breeding, University of Kassel, 37213 Witzenhausen, Germany.

出版信息

J Dairy Sci. 2016 Sep;99(9):7261-7273. doi: 10.3168/jds.2016-10887. Epub 2016 Jun 22.

DOI:10.3168/jds.2016-10887
PMID:27344385
Abstract

A simulation study was conducted to investigate the performance of random forest (RF) and genomic BLUP (GBLUP) for genomic predictions of binary disease traits based on cow calibration groups. Training and testing sets were modified in different scenarios according to disease incidence, the quantitative-genetic background of the trait (h(2)=0.30 and h(2)=0.10), and the genomic architecture [725 quantitative trait loci (QTL) and 290 QTL, populations with high and low levels of linkage disequilibrium (LD)]. For all scenarios, 10,005 SNP (depicting a low-density 10K SNP chip) and 50,025 SNP (depicting a 50K SNP chip) were evenly spaced along 29 chromosomes. Training and testing sets included 20,000 cows (4,000 sick, 16,000 healthy, disease incidence 20%) from the last 2 generations. Initially, 4,000 sick cows were assigned to the testing set, and the remaining 16,000 healthy cows represented the training set. In the ongoing allocation schemes, the number of sick cows in the training set increased stepwise by moving 10% of the sick animals from the testing set to the training set, and vice versa. The size of the training and testing sets was kept constant. Evaluation criteria for both GBLUP and RF were the correlations between genomic breeding values and true breeding values (prediction accuracy), and the area under the receiving operating characteristic curve (AUROC). Prediction accuracy and AUROC increased for both methods and all scenarios as increasing percentages of sick cows were allocated to the training set. Highest prediction accuracies were observed for disease incidences in training sets that reflected the population disease incidence of 0.20. For this allocation scheme, the largest prediction accuracies of 0.53 for RF and of 0.51 for GBLUP, and the largest AUROC of 0.66 for RF and of 0.64 for GBLUP, were achieved using 50,025 SNP, a heritability of 0.30, and 725 QTL. Heritability decreases from 0.30 to 0.10 and QTL reduction from 725 to 290 were associated with decreasing prediction accuracy and decreasing AUROC for all scenarios. This decrease was more pronounced for RF. Also, the increase of LD had stronger effect on RF results than on GBLUP results. The highest prediction accuracy from the low LD scenario was 0.30 from RF and 0.36 from GBLUP, and increased to 0.39 for both methods in the high LD population. Random forest successfully identified important SNP in close map distance to QTL explaining a high proportion of the phenotypic trait variations.

摘要

开展了一项模拟研究,以调查基于奶牛校准群体对二元疾病性状进行基因组预测时随机森林(RF)和基因组最佳线性无偏预测(GBLUP)的性能。根据疾病发病率、性状的数量遗传背景(h² = 0.30和h² = 0.10)以及基因组结构[725个数量性状位点(QTL)和290个QTL,具有高和低连锁不平衡(LD)水平的群体],在不同场景下对训练集和测试集进行了修改。对于所有场景,10,005个单核苷酸多态性(SNP)(代表低密度10K SNP芯片)和50,025个SNP(代表50K SNP芯片)沿着29条染色体均匀分布。训练集和测试集包括来自最近两代的20,000头奶牛(4,000头发病,16,000头健康,疾病发病率20%)。最初,4,000头发病奶牛被分配到测试集,其余16,000头健康奶牛代表训练集。在持续的分配方案中,训练集中发病奶牛的数量通过将10%的发病动物从测试集转移到训练集而逐步增加,反之亦然。训练集和测试集的大小保持不变。GBLUP和RF的评估标准都是基因组育种值与真实育种值之间的相关性(预测准确性)以及接受者操作特征曲线下的面积(AUROC)。随着分配到训练集的发病奶牛百分比增加,两种方法在所有场景下的预测准确性和AUROC都有所提高。在反映群体疾病发病率为0.20的训练集中观察到最高的预测准确性。对于此分配方案,使用50,025个SNP、遗传力为0.30和725个QTL时,RF的最大预测准确性为0.53,GBLUP为0.51,RF的最大AUROC为0.66,GBLUP为0.64。遗传力从0.30降至0.10以及QTL从725减少到290与所有场景下预测准确性降低和AUROC降低相关。这种降低在RF中更为明显。此外,LD的增加对RF结果的影响比对GBLUP结果的影响更强。低LD场景下的最高预测准确性,RF为0.30,GBLUP为0.36,在高LD群体中两种方法均提高到0.39。随机森林成功识别出与QTL紧密连锁的重要SNP,这些SNP解释了很大比例的表型性状变异。

相似文献

1
Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups.在模拟奶牛校准群体中,针对不同疾病发病率和基因组结构的疾病易感性,采用随机森林法估计基因组育种值。
J Dairy Sci. 2016 Sep;99(9):7261-7273. doi: 10.3168/jds.2016-10887. Epub 2016 Jun 22.
2
Genomic breeding values, SNP effects and gene identification for disease traits in cow training sets.奶牛训练集中疾病性状的基因组育种值、单核苷酸多态性效应及基因鉴定
Anim Genet. 2018 Jun;49(3):178-192. doi: 10.1111/age.12661. Epub 2018 Apr 6.
3
Empirical and deterministic accuracies of across-population genomic prediction.跨群体基因组预测的经验性和确定性准确性。
Genet Sel Evol. 2015 Feb 6;47(1):5. doi: 10.1186/s12711-014-0086-0.
4
Use of a Bayesian model including QTL markers increases prediction reliability when test animals are distant from the reference population.当测验动物与参考群体相距较远时,使用包含 QTL 标记的贝叶斯模型可以提高预测的可靠性。
J Dairy Sci. 2019 Aug;102(8):7237-7247. doi: 10.3168/jds.2018-15815. Epub 2019 May 31.
5
Using markers with large effect in genetic and genomic predictions.在遗传和基因组预测中使用具有大效应的标记。
J Anim Sci. 2017 Jan;95(1):59-71. doi: 10.2527/jas.2016.0754.
6
Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations.利用选择指数理论估计多基因座连锁不平衡在不同群体间的一致性。
BMC Genet. 2015 Jul 19;16:87. doi: 10.1186/s12863-015-0252-6.
7
Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.用于澳大利亚主要绵羊品种的低密度单核苷酸多态性(SNP)芯片设计及其对填充和基因组预测准确性的影响。
Anim Genet. 2015 Oct;46(5):544-56. doi: 10.1111/age.12340. Epub 2015 Sep 11.
8
Strategy for the simulation and analysis of longitudinal phenotypic and genomic data in the context of a temperature × humidity-dependent covariate.在温度×湿度相关协变量背景下对纵向表型和基因组数据进行模拟与分析的策略
J Dairy Sci. 2014;97(4):2444-54. doi: 10.3168/jds.2013-7143. Epub 2014 Jan 31.
9
The effect of using cow genomic information on accuracy and bias of genomic breeding values in a simulated Holstein dairy cattle population.利用奶牛基因组信息对模拟荷斯坦奶牛群体中基因组育种值的准确性和偏差的影响。
J Dairy Sci. 2018 Jun;101(6):5166-5176. doi: 10.3168/jds.2017-12999. Epub 2018 Mar 28.
10
Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle.预测肉牛、瘤牛和杂交肉牛的剩余采食量和胴体及肉质性状的基因组育种值的准确性。
J Anim Sci. 2013 Jul;91(7):3088-104. doi: 10.2527/jas.2012-5827. Epub 2013 May 8.

引用本文的文献

1
Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.基因组最佳线性无偏预测与四种机器学习模型在工作犬基因组育种值估计中的性能比较
Animals (Basel). 2025 Feb 2;15(3):408. doi: 10.3390/ani15030408.
2
Single-Step Breeding Value Estimations and Optimum Contribution Selection in Endangered Dual-Purpose German Black Pied Cattle (DSN) Using a Breed Specific SNP Chip.使用特定品种SNP芯片对濒危兼用型德国黑花斑牛(DSN)进行单步育种值估计和最佳贡献选择
J Anim Breed Genet. 2025 Sep;142(5):560-570. doi: 10.1111/jbg.12929. Epub 2025 Feb 1.
3
Integrating Bioinformatics and Machine Learning for Genomic Prediction in Chickens.
将生物信息学和机器学习整合用于鸡的基因组预测。
Genes (Basel). 2024 May 26;15(6):690. doi: 10.3390/genes15060690.
4
Body weight prediction of Belgian Blue crossbred using random forest.利用随机森林预测比利时蓝牛杂交种的体重
J Adv Vet Anim Res. 2024 Mar 31;11(1):181-184. doi: 10.5455/javar.2024.k763. eCollection 2024 Mar.
5
A review of machine learning models applied to genomic prediction in animal breeding.应用于动物育种基因组预测的机器学习模型综述。
Front Genet. 2023 Sep 6;14:1150596. doi: 10.3389/fgene.2023.1150596. eCollection 2023.
6
A zero altered Poisson random forest model for genomic-enabled prediction.用于基因组辅助预测的零改变泊松随机森林模型。
G3 (Bethesda). 2021 Feb 9;11(2). doi: 10.1093/g3journal/jkaa057.
7
Genome-wide associations and detection of potential candidate genes for direct genetic and maternal genetic effects influencing dairy cattle body weight at different ages.全基因组关联分析及潜在候选基因检测对不同年龄奶牛体尺直接遗传和母体遗传效应的影响。
Genet Sel Evol. 2019 Feb 6;51(1):4. doi: 10.1186/s12711-018-0444-4.