Department of Animal Sciences, University of Wisconsin-Madison, Madison, WI.
DSM Produtos Nutricionais Brasil S.A., São Paulo, Brazil.
J Anim Sci. 2020 Apr 1;98(4). doi: 10.1093/jas/skaa089.
With agriculture rapidly becoming a data-driven field, it is imperative to extract useful information from large data collections to optimize the production systems. We compared the efficacy of regression (linear regression or generalized linear regression [GLR] for continuous or categorical outcomes, respectively), random forests (RF) and multilayer neural networks (NN) to predict beef carcass weight (CW), age when finished (AS), fat deposition (FD), and carcass quality (CQ). The data analyzed contained information on over 4 million beef cattle from 5,204 farms, corresponding to 4.3% of Brazil's national production between 2014 and 2016. Explanatory variables were integrated from different data sources and encompassed animal traits, participation in a technical advising program, nutritional products sold to farms, economic variables related to beef production, month when finished, soil fertility, and climate in the location in which animals were raised. The training set was composed of information collected in 2014 and 2015, while the testing set had information recorded in 2016. After parameter tuning for each algorithm, models were used to predict the testing set. The best model to predict CW and AS was RF (CW: predicted root mean square error = 0.65, R2 = 0.61, and mean absolute error = 0.49; AS: accuracy = 28.7%, Cohen's kappa coefficient [Kappa] = 0.08). While the best approach for FD and CQ was GLR (accuracy = 45.7%, Kappa = 0.05, and accuracy = 58.7%, Kappa = 0.09, respectively). Across all models, there was a tendency for better performance with RF and regression and worse with NN. Animal category, nutritional plan, cattle sales price, participation in a technical advising program, and climate and soil in which animals were raised were deemed important for prediction of meat production and quality with regression and RF. The development of strategies for prediction of livestock production using real-world large-scale data will be core to projecting future trends and optimizing the allocation of resources at all levels of the production chain, rendering animal production more sustainable. Despite beef cattle production being a complex system, this analysis shows that by integrating different sources of data it is possible to forecast meat production and quality at the national level with moderate-high levels of accuracy.
随着农业迅速成为一个数据驱动的领域,从大量数据集中提取有用信息以优化生产系统是当务之急。我们比较了回归(线性回归或广义线性回归,分别用于连续或分类结果)、随机森林(RF)和多层神经网络(NN)在预测牛肉胴体重量(CW)、育肥完成时的年龄(AS)、脂肪沉积(FD)和胴体质量(CQ)方面的效果。分析的数据包含了 2014 年至 2016 年来自 5204 个农场的超过 400 万头肉牛的信息,占巴西全国产量的 4.3%。解释变量来自不同的数据源,包括动物特征、参与技术咨询计划、销售给农场的营养产品、与牛肉生产相关的经济变量、育肥完成月份、土壤肥力和动物饲养地点的气候。训练集由 2014 年和 2015 年收集的信息组成,而测试集则记录了 2016 年的信息。在为每个算法调整参数后,模型用于预测测试集。预测 CW 和 AS 的最佳模型是 RF(CW:预测均方根误差=0.65,R2=0.61,平均绝对误差=0.49;AS:准确率=28.7%,Cohen's kappa 系数[Kappa]=0.08)。而预测 FD 和 CQ 的最佳方法是 GLR(准确率=45.7%,Kappa=0.05,准确率=58.7%,Kappa=0.09)。在所有模型中,RF 和回归的表现优于 NN。动物类别、营养计划、牛销售价格、参与技术咨询计划以及动物饲养的气候和土壤被认为是回归和 RF 预测肉类生产和质量的重要因素。使用真实世界的大规模数据开发牲畜生产预测策略将是预测未来趋势和优化生产链各级资源分配的核心,使动物生产更具可持续性。尽管肉牛生产是一个复杂的系统,但本分析表明,通过整合不同来源的数据,有可能以中等高水平的准确率预测全国范围内的肉类生产和质量。