Suppr超能文献

比较回归、朴素贝叶斯和随机森林方法在荷斯坦奶牛个体预测第二次泌乳存活中的应用。

Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle.

机构信息

Wageningen University and Research Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.

Wageningen University and Research Animal Breeding and Genomics, PO Box 338, 6700 AH Wageningen, the Netherlands.

出版信息

J Dairy Sci. 2019 Oct;102(10):9409-9421. doi: 10.3168/jds.2019-16295. Epub 2019 Aug 22.

Abstract

In this study, we compared multiple logistic regression, a linear method, to naive Bayes and random forest, 2 nonlinear machine-learning methods. We used all 3 methods to predict individual survival to second lactation in dairy heifers. The data set used for prediction contained 6,847 heifers born between January 2012 and June 2013, and had known survival outcomes. Each animal had 50 genomic estimated breeding values available at birth and up to 65 phenotypic variables that accumulated over time. Survival was predicted at 5 moments in life: at birth, at 18 mo, at first calving, at 6 wk after first calving, and at 200 d after first calving. The data sets were randomly split into 70% training and 30% testing sets to evaluate model performance for 20-fold validation. The methods were compared for accuracy, sensitivity, specificity, area under the curve (AUC) value, contrasts between groups for the prediction outcomes, and increase in surviving animals in a practical scenario. At birth and 18 mo, all methods had overlapping performance; no method significantly outperformed the other. At first calving, 6 wk after first calving, and 200 d after first calving, random forest and naive Bayes had overlapping performance, and both machine-learning methods outperformed multiple logistic regression. Overall, naive Bayes has the highest average AUC at all decision points up to 200 d after first calving. Random forest had the highest AUC at 200 d after first calving. All methods obtained similar increases in survival in the practical scenario. Despite this, the methods appeared to predict the survival of individual heifers differently. All methods improved over time, but the changes in mean model outcomes for surviving and non-surviving animals differed by method. Furthermore, the correlations of individual predictions between methods ranged from r = 0.417 to r = 0.700; the lowest correlations were at first calving for all methods. In short, all 3 methods were able to predict survival at a population level, because all methods improved survival in a practical scenario. However, depending on the method used, predictions for individual animals were quite different between methods.

摘要

在这项研究中,我们将比较多项逻辑回归(一种线性方法)、朴素贝叶斯和随机森林(两种非线性机器学习方法)。我们使用这三种方法来预测奶牛小母牛的个体第二次泌乳的生存情况。用于预测的数据集中包含了 6847 头出生于 2012 年 1 月至 2013 年 6 月之间的小母牛,且这些小母牛的生存结果是已知的。每头动物在出生时都有 50 个基因组估计的育种值,并且随着时间的推移,还有多达 65 个表型变量不断累积。在五个生命时刻预测了生存情况:出生时、18 月龄时、第一次产犊时、第一次产犊后 6 周时和第一次产犊后 200 天。数据集随机分为 70%的训练集和 30%的测试集,用于 20 倍验证评估模型性能。比较了方法的准确性、敏感性、特异性、曲线下面积(AUC)值、预测结果的组间差异以及在实际情况下增加存活动物的数量。在出生和 18 月龄时,所有方法的性能均有重叠;没有一种方法明显优于其他方法。在第一次产犊、第一次产犊后 6 周和第一次产犊后 200 天,随机森林和朴素贝叶斯的性能有重叠,并且这两种机器学习方法均优于多项逻辑回归。总体而言,朴素贝叶斯在第一次产犊后 200 天的所有决策点的平均 AUC 值最高。随机森林在第一次产犊后 200 天的 AUC 值最高。所有方法在实际情况下都获得了相似的生存增加量。尽管如此,这些方法似乎对个体小母牛的生存预测不同。所有方法都随着时间的推移而改进,但方法之间幸存和非幸存动物的模型结果均值的变化不同。此外,方法之间的个体预测相关性范围为 r = 0.417 至 r = 0.700;所有方法的最低相关性均为第一次产犊。简而言之,所有三种方法都能够在群体水平上预测生存情况,因为所有方法都在实际情况下提高了生存机会。但是,根据所使用的方法,不同方法之间对个体动物的预测差异很大。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验