Mota Lucio F M, Giannuzzi Diana, Bisutti Vittoria, Pegolo Sara, Trevisi Erminio, Schiavon Stefano, Gallo Luigi, Fineboym David, Katz Gil, Cecchinato Alessio
Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.
Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.
J Dairy Sci. 2022 May;105(5):4237-4255. doi: 10.3168/jds.2021-21426. Epub 2022 Mar 10.
Cheese-making traits in dairy cattle are important to the dairy industry but are difficult to measure at the individual level because there are limitations on collecting phenotypic information. Mid-infrared spectroscopy has its advantages, but it can only be used during monthly milk recordings. Recently, in-line devices for real-time analysis of milk quality have been developed. The AfiLab recording system (Afimilk) offers significant benefits as phenotypes can be collected from each cow at each milking session. The objective of this study was to assess the potential of integrating AfiLab real-time milk analyzer measures with the stacking ensemble learning technique using heterogeneous base learners for the in-line daily monitoring of cheese-making traits in Holstein cattle with a view to developing a precision livestock farming system for monitoring the technological quality of milk. Data and samples for wet-laboratory analyses were collected from 499 Holstein cows belonging to 2 farms where the AfiLab system was installed. The traits of concern were 9 milk coagulation traits [3 milk coagulation properties (MCP), and 6 curd firming traits (CF)], and 7 cheese-making traits [3 cheese yield (CY) traits, and 4 milk nutrient recovery in the curd (REC) traits]. The near-infrared AfiLab spectral data and on-farm information (days in milk and parity) were used to assess the predictive ability of different statistical methods [elastic net (EN), gradient boosting machine (GBM), extreme gradient boosting (XGBoost), and artificial neural network (ANN)] across different cross-validation scenarios. These statistical methods were considered the base learners, which were then combined in a stacking ensemble learning. Results indicate that including information on the cows (days in milk and parity) in the AfiLab infrared prediction increased its accuracy by 10.3% for traditional MCP, 13.8% for curd firming, 9.8% for CY, and 11.2% for REC traits compared with those obtained from near-infrared AfiLab alone. The statistical approaches exhibited high prediction accuracies (R) averaged across the cross-validation scenarios for traditional MCP (0.58 for ANN, 0.55 for EN and GBM, 0.52 for XGBoost, and 0.62 for stacking ensemble), CF (0.55 for ANN, 0.54 for EN and GBM, 0.53 for XGBoost, and 0.61 for stacking ensemble), and similar R averages for CY and REC (0.55 for ANN, 0.54 for EN and GBM, 0.53 for XGBoost, and 0.61 for stacking ensemble). The ANN approach was more accurate than the other base learners (EN, GBM, and XGBoost) and improved accuracy across cross-validation scenarios on average by 7% for traditional MCP, 5% for CF, 8% for CY, and 7% for REC. The stacking ensemble method improved prediction accuracy by 3% to 31% for traditional MCP, 2% to 26% for CF, 1% to 38% for CY traits, and 2% to 27% for REC traits compared with the base learners. The prediction accuracies of the different approaches evaluated tended to decrease from the 10-fold cross-validation to the independent validation scenario, although there was a smaller reduction in prediction accuracy with the stacking ensemble learning technique across all the cross-validation scenarios. Our results show that combining in-line on-farm information with stacking ensemble machine learning represents an effective alternative for obtaining robust daily predictions of milk cheese-making traits.
奶牛的奶酪制作特性对乳制品行业很重要,但由于收集表型信息存在局限性,在个体层面上很难进行测量。中红外光谱法有其优势,但只能在每月的牛奶记录期间使用。最近,已经开发出用于实时分析牛奶质量的在线设备。AfiLab记录系统(Afimilk)具有显著优势,因为可以在每次挤奶时从每头奶牛收集表型数据。本研究的目的是评估将AfiLab实时牛奶分析仪测量结果与堆叠集成学习技术相结合的潜力,该技术使用异构基学习器对荷斯坦奶牛的奶酪制作特性进行在线每日监测,以期开发一种用于监测牛奶技术质量的精准畜牧养殖系统。从安装了AfiLab系统的2个农场的499头荷斯坦奶牛中收集了用于湿实验室分析的数据和样本。所关注的特性包括9种牛奶凝固特性[3种牛奶凝固特性(MCP)和6种凝乳硬度特性(CF)],以及7种奶酪制作特性[3种奶酪产量(CY)特性和4种凝乳中的牛奶营养回收率(REC)特性]。利用近红外AfiLab光谱数据和农场信息(泌乳天数和胎次)来评估不同统计方法[弹性网络(EN)、梯度提升机(GBM)、极端梯度提升(XGBoost)和人工神经网络(ANN)]在不同交叉验证场景下的预测能力。这些统计方法被视为基学习器,然后将它们组合成堆叠集成学习。结果表明,与仅从近红外AfiLab获得的结果相比,在AfiLab红外预测中纳入奶牛信息(泌乳天数和胎次)后,传统MCP的预测准确率提高了10.3%,凝乳硬度提高了13.8%,CY提高了9.8%,REC特性提高了11.2%。在交叉验证场景中,传统MCP的统计方法表现出较高的预测准确率(R)(ANN为0.58,EN和GBM为0.55,XGBoost为0.52,堆叠集成学习为0.62),CF(ANN为0.55,EN和GBM为0.54,XGBoost为0.53,堆叠集成学习为0.61),CY和REC的R平均值相似(ANN为0.55,EN和GBM为0.54,XGBoost为0.53,堆叠集成学习为0.61)。ANN方法比其他基学习器(EN、GBM和XGBoost)更准确,在交叉验证场景中,传统MCP的准确率平均提高了7%,CF提高了5%,CY提高了8%,REC提高了7%。与基学习器相比,堆叠集成方法使传统MCP的预测准确率提高了3%至31%,CF提高了2%至26%,CY特性提高了1%至38%,REC特性提高了2%至27%。尽管在所有交叉验证场景中,堆叠集成学习技术使预测准确率降低的幅度较小,但所评估的不同方法的预测准确率从10折交叉验证到独立验证场景往往会降低。我们的结果表明,将农场在线信息与堆叠集成机器学习相结合代表了一种有效的替代方法,可用于获得对牛奶奶酪制作特性的可靠每日预测。