Suppr超能文献

利用奶牛在首次泌乳期间收集的商业传感器数据,进行基于数据驱动的奶牛终生抗逆性预测。

A data-driven prediction of lifetime resilience of dairy cows using commercial sensor data collected during first lactation.

机构信息

Wageningen University and Research, Animal Health and Welfare, PO Box 338, 6700 AH, Wageningen, the Netherlands.

Wageningen University and Research, Animal Breeding and Genomics, PO Box 338, 6700 AH, Wageningen, the Netherlands.

出版信息

J Dairy Sci. 2021 Nov;104(11):11759-11769. doi: 10.3168/jds.2021-20413. Epub 2021 Aug 26.

Abstract

Reliable prediction of lifetime resilience early in life can contribute to improved management decisions of dairy farmers. Several studies have shown that time series sensor data can be used to predict lifetime resilience rankings. However, such predictions generally require the translation of sensor data into biologically meaningful sensor features, which involve proper feature definitions and a lot of preprocessing. The objective of this study was to investigate the hypothesis that data-driven random forest algorithms can equal or improve the prediction of lifetime resilience scores compared with ordinal logistic regression, and that these algorithms require considerably less effort for data preprocessing. We studied this by developing prediction models that forecast lifetime resilience of a cow early in her productive life using sensor data from the first lactation. We used an existing data set from a Dutch experimental herd, with data of culled cows for which birth dates, insemination dates, calving dates, culling dates, and health treatments were available to calculate lifetime resilience scores. Moreover, 4 types of first-lactation sensor data, converted to daily aggregated values, were available: milk yield, body weight, activity, and rumination. For each sensor, 14 sensor features were calculated, of which part were based on absolute daily values and part on relative to herd average values. First, we predicted lifetime resilience rank with stepwise logistic regression using sensor features as predictors and a P-value of <0.2 as the cut-off. Next, we applied a random forest with the 6 features that remained in the final logistic regression model. We then applied a random forest with all sensor features, and finally applied a random forest with daily aggregated values as features. All models were validated with stratified 10-fold cross-validation with 90% of the records in the training set and 10% in the validation set. Model performances expressed in percentage of correctly classified cows (accuracy) and percentage of cows being critically misclassified (i.e., high as low and vice versa) ± standard deviation were 45.1 ± 8.1% and 10.8% with the ordinal logistic regression model, 45.7 ± 8.4% and 16.0% with the random forest using the same 6 features as the logistic regression model, 48.4 ± 6.7% and 10.0% for the random forest with all sensor features, and 50.5 ± 6.3% and 8.4% for the random forest with daily sensor values. This random forest also revealed that data collected in early and late stages of first lactation seem to be of particular importance in the prediction compared with that in mid lactation. Accuracies of the models were not significantly different, but the percentage of critically misclassified cows was significantly higher for the second model than for the other models. We concluded that a data-driven random forest algorithm with daily aggregated sensor data as input can be used for the prediction of lifetime resilience classification with an overall accuracy of ~50%, and provides at least as good prediction as models with sensor features as input.

摘要

可靠的终生抗逆力预测可以帮助奶农做出更好的管理决策。有几项研究表明,时间序列传感器数据可用于预测终生抗逆力排名。然而,这种预测通常需要将传感器数据转换为具有生物学意义的传感器特征,这涉及到适当的特征定义和大量的预处理。本研究旨在验证以下假设:与有序逻辑回归相比,数据驱动的随机森林算法可以达到或提高终生抗逆力评分的预测效果,而且这些算法在数据预处理方面需要的工作量大大减少。我们通过开发预测模型来研究这一点,该模型使用首胎哺乳期的传感器数据来预测奶牛在其生产生命早期的终生抗逆力。我们使用了荷兰一个实验性牛群的现有数据集,其中包括因产奶量低而被淘汰的奶牛的出生日期、配种日期、产犊日期、淘汰日期和健康治疗的数据,以计算终生抗逆力评分。此外,还有 4 种首胎哺乳期传感器数据可用,这些数据转换为每日汇总值:产奶量、体重、活动量和反刍。对于每种传感器,都计算了 14 种传感器特征,其中一部分基于绝对每日值,另一部分基于相对于群体平均值。首先,我们使用传感器特征作为预测因子,通过逐步逻辑回归来预测终生抗逆力等级,并将 P 值<0.2 作为截止值。接下来,我们应用了一个随机森林,其中包含最终逻辑回归模型中剩下的 6 个特征。然后,我们应用了一个具有所有传感器特征的随机森林,最后应用了一个具有每日汇总值作为特征的随机森林。所有模型均采用分层 10 倍交叉验证进行验证,其中 90%的记录用于训练集,10%用于验证集。以正确分类的奶牛百分比(准确率)和错误分类的奶牛百分比(即高与低之间的错误分类和反之亦然)±标准差表示的模型性能为 45.1±8.1%和 10.8%,有序逻辑回归模型为 45.7±8.4%和 16.0%,与逻辑回归模型相同的 6 个特征的随机森林为 48.4±6.7%和 10.0%,而具有所有传感器特征的随机森林为 50.5±6.3%和 8.4%。这个随机森林还表明,与泌乳中期相比,泌乳早期和晚期的数据似乎在预测中尤为重要。模型的准确率没有显著差异,但第二个模型的错误分类奶牛比例显著高于其他模型。我们得出的结论是,使用具有每日汇总传感器数据作为输入的基于数据驱动的随机森林算法可以实现终生抗逆力分类的预测,整体准确率约为 50%,并且提供的预测效果至少与使用传感器特征作为输入的模型一样好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验