Guo Rui, Dai Yongqiang, Hu Junjie
College of Information Science and Technology, Gansu Agricultural University, Lanzhou, China.
College of Veterinary Medicine, Gansu Agricultural University, Lanzhou, China.
Front Vet Sci. 2025 Apr 24;12:1575525. doi: 10.3389/fvets.2025.1575525. eCollection 2025.
Mastitis in dairy cows is a significant challenge faced by the global dairy industry, significantly affecting the quality and output of milk from dairy enterprises and causing them to suffer severe economic losses. With the increasing public concern over food safety and the rational use of antibiotics, how to identify cows at risk of disease early has become a key issue that needs to be urgently addressed. Especially subclinical mastitis, due to the lack of obvious external symptoms, makes detection more difficult, so early warning of it is particularly important.
In this study, a time series prediction method, combined with machine learning techniques, was used to predict the risk of mastitis in dairy cows. The study data were obtained from the production records of 4000 dairy cows in a large farm in Hexi region of Gansu. By constructing time-series features, production indicators such as milk yield, fat rate and protein rate of each cow in two consecutive months, April and May, were utilized to predict its health status in June. To fully exploit the value of the time series features, we designed a multidimensional feature set that included raw indicator values, monthly change rates, and statistical features. After data preprocessing and sample balancing, data from 2821 cows were selected for model training. Finally, the applicability of each model was assessed by comparing and analyzing the prediction performance of six models, namely eXtreme Gradient Boosting(XGBoost), Gradient Boosting Decision Tree (GBDT), Support Vector Machine (SVM), K Nearest Neighbors (KNN), Logistic Regression, and Long Short-Term Memory Network (LSTM).
The XGBoost model demonstrated optimal performance, achieving an area under the ROC curve (AUC) of 0.75 with an accuracy rate of 71.36%. Feature importance analysis revealed three key temporal indicators significantly influencing prediction outcomes: May milk yield (22.29%), standard deviation of fat percentage (20.27%), and fat percentage change rate (19.87%). SHapley Additive exPlanations (SHAP) value analysis further validated the predictive value of these temporal features, providing dairy farm managers with clearly defined monitoring priorities.
The XGBoost model demonstrates strong potential as an accurate predictive tool for subclinical mastitis in dairy cows. This study presents an effective early-warning approach through time-series modeling that offers significant practical value for mastitis prevention in dairy farm management.
奶牛乳腺炎是全球乳制品行业面临的一项重大挑战,严重影响乳制品企业的牛奶质量和产量,给企业造成严重经济损失。随着公众对食品安全和抗生素合理使用的关注度不断提高,如何早期识别患病风险奶牛已成为亟待解决的关键问题。尤其是隐性乳腺炎,由于缺乏明显的外部症状,使得检测更加困难,因此对其进行早期预警尤为重要。
本研究采用一种时间序列预测方法,并结合机器学习技术,对奶牛乳腺炎风险进行预测。研究数据来自甘肃河西地区某大型养殖场4000头奶牛的生产记录。通过构建时间序列特征,利用4月和5月连续两个月每头奶牛的产奶量、脂肪率和蛋白率等生产指标来预测其6月份的健康状况。为充分挖掘时间序列特征的价值,我们设计了一个多维特征集,包括原始指标值、月变化率和统计特征。经过数据预处理和样本均衡后,选取2821头奶牛的数据进行模型训练。最后,通过比较分析极端梯度提升(XGBoost)、梯度提升决策树(GBDT)、支持向量机(SVM)、K近邻(KNN)、逻辑回归和长短期记忆网络(LSTM)六种模型的预测性能,评估各模型的适用性。
XGBoost模型表现出最优性能,受试者工作特征曲线下面积(AUC)达到0.75,准确率为71.36%。特征重要性分析揭示了三个显著影响预测结果的关键时间指标:5月份产奶量(22.29%)、脂肪百分比标准差(20.27%)和脂肪百分比变化率(19.87%)。SHapley加性解释(SHAP)值分析进一步验证了这些时间特征的预测价值,为奶牛场管理人员提供了明确的监测重点。
XGBoost模型作为一种准确预测奶牛隐性乳腺炎的工具具有很大潜力。本研究通过时间序列建模提出了一种有效的早期预警方法,为奶牛场乳腺炎预防管理提供了重要的实用价值。