Suppr超能文献

使用常规牛奶检测中的中红外光谱信息,不同机器学习算法对畜群水平基于草地的草料比例进行分类的适用性。

Suitability of different machine learning algorithms for the classification of the proportion of grassland-based forages at the herd level using mid-infrared spectral information from routine milk control.

作者信息

Birkinshaw A, Sutter M, Nussbaum M, Kreuzer M, Reidy B

机构信息

Bern University of Applied Sciences (BFH), School for Agricultural, Forest and Food Sciences (HAFL), CH-3052 Zollikofen, Switzerland.

Bern University of Applied Sciences (BFH), School for Agricultural, Forest and Food Sciences (HAFL), CH-3052 Zollikofen, Switzerland; University of Utrecht, Department of Physical Geography, 3584 CB Utrecht, the Netherlands.

出版信息

J Dairy Sci. 2024 Dec;107(12):10724-10737. doi: 10.3168/jds.2024-25090. Epub 2024 Aug 30.

Abstract

As the call for an international standard for milk from grassland-based production systems continues to grow, so too do the monitoring and evaluation policies surrounding this topic. Individual stipulations by countries and milk producers to market their milk under their own grass-fed labels include a compulsory number of grazing days per year (ranging from 120 d for certain labels to 180 d for others), a specified amount of herbage in the diet, or a prescribed dietary proportion of grassland-based forages (GBF) fed and produced on-farm. As these multifarious policy and label requirements are laborious and costly to monitor on-farm, fast economical proxies would be advantageous to verify the proportion of GBF consumed by the cows in the final product. With this in mind, we employed readily available mid-infrared spectral data (n = 1,132 spectra) from routine milk controls to develop binary classification models for 4 main feed groups from a primarily forage-based diet: total GBF (≥50% [n = 955], ≥75% [n = 599], ≥85% [n = 356]), pasture (≥20% [n = 451], ≥50% [n = 284], ≥70% [n = 152]), fresh herbage (pasture + fresh herbage indoor feeding; ≥20% [n = 517], ≥50% [n = 325], ≥70% [n = 182]), and whole plant corn (fresh + conserved; ≥10% [n = 646], ≥30% [n = 187]), with the latter as a negative control. We compared 4 machine learning methods to assess which statistical model performs best at discriminating these classes. Three of these models have not yet been tested for herd-level dietary proportion classification, and all 4 follow completely different approaches: least absolute shrinkage and selection operator (LASSO), partial least squares discriminant analysis (PLS-DA), random forest (RF), and support vector machines (SVM). Seasonality has been a missing element from previous dietary herbage proportion classification models. As grazing and fresh herbage indoor feeding are highly dependent on the season, we developed an indicator to incorporate seasonality in a consistent, unbiased manner into our models. We also tested 3 sets of covariates. The first set included only mid-infrared spectra derived data, the second included mid-infrared spectra derived data plus seasonality indices and the third included mid-infrared spectra derived data, seasonality indices and additional herd specific information (DIM, breed, and parity). Of the 4 machine learning algorithms tested for the binary classification of GBF proportion at herd level, LASSO and PLS-DA performed best according to evaluation metrics; however, the RF and SVM models were not far behind the best performing model evaluation metrics in each feed category. Our best performing model, the LASSO model containing seasonality indices and herd specific information, classified total GBF ≥50% with an accuracy of 78.6%, precision of 85.1%, sensitivity of 90.6%, specificity of 14.1%, and F1 score (harmonic mean of precision and sensitivity) of 87.7%; this was very similar to the PLS-DA model. Our results suggest that in general, LASSO and PLS-DA machine learning algorithms perform better for dietary GBF classification than RF or SVM algorithms.

摘要

随着对基于草原生产系统的牛奶国际标准的呼声不断高涨,围绕这一主题的监测和评估政策也在不断增加。各国和牛奶生产商对以自有草饲标签销售牛奶的个别规定包括每年强制放牧天数(某些标签为120天,其他标签为180天)、日粮中特定数量的牧草,或规定的农场饲养和生产的基于草地的饲料(GBF)的日粮比例。由于这些五花八门的政策和标签要求在农场监测既费力又昂贵,快速经济的替代方法将有利于核实最终产品中奶牛消耗的GBF比例。考虑到这一点,我们利用常规牛奶检测中现成的中红外光谱数据(n = 1132个光谱),为主要基于草料的日粮中的4种主要饲料组开发二元分类模型:总GBF(≥50% [n = 955],≥75% [n = 599],≥85% [n = 356])、牧场(≥20% [n = 451],≥50% [n = 284],≥70% [n = 152])、新鲜牧草(牧场+室内新鲜牧草喂养;≥20% [n = 517],≥50% [n = 325],≥70% [n = 182])和全株玉米(新鲜+青贮;≥10% [n = 646],≥30% [n = 187]),后者作为阴性对照。我们比较了4种机器学习方法,以评估哪种统计模型在区分这些类别方面表现最佳。其中3种模型尚未针对畜群水平的日粮比例分类进行测试,所有4种模型都采用了完全不同的方法:最小绝对收缩和选择算子(LASSO)、偏最小二乘判别分析(PLS-DA)、随机森林(RF)和支持向量机(SVM)。季节性一直是以往日粮牧草比例分类模型中缺失的要素。由于放牧和室内新鲜牧草喂养高度依赖季节,我们开发了一个指标,以一致、无偏的方式将季节性纳入我们的模型。我们还测试了3组协变量。第一组仅包括从中红外光谱导出的数据,第二组包括从中红外光谱导出的数据加上季节性指数,第三组包括从中红外光谱导出的数据、季节性指数和其他畜群特定信息(产犊间隔、品种和胎次)。在对畜群水平的GBF比例进行二元分类测试的4种机器学习算法中,根据评估指标,LASSO和PLS-DA表现最佳;然而,RF和SVM模型在每个饲料类别中的表现也与最佳模型评估指标相差不远。我们表现最佳的模型,即包含季节性指数和畜群特定信息的LASSO模型,对总GBF≥50%的分类准确率为78.6%,精确率为85.1%,灵敏度为90.6%,特异性为14.1%,F1分数(精确率和灵敏度的调和平均值)为87.7%;这与PLS-DA模型非常相似。我们的结果表明,一般来说,LASSO和PLS-DA机器学习算法在日粮GBF分类方面比RF或SVM算法表现更好。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验