Suppr超能文献

利用无监督机器学习技术检测奶牛日产奶量数据中的异常值。

Leveraging unsupervised machine learning techniques for detecting outliers in the daily milk yield data of dairy cows.

作者信息

Higaki Shogo, Freitas Eduardo Noronha de Andrade, Negreiro Ariana, Dórea João R R, Cabrera Victor E

机构信息

National Institute of Animal Health, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-0856, Japan; Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706.

Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706; Department of Informatic, Federal Institute of Goiás, Goiânia, Goiás 74130-012, Brazil.

出版信息

J Dairy Sci. 2025 Sep;108(9):9696-9711. doi: 10.3168/jds.2024-25889. Epub 2025 Jul 16.

Abstract

The lactation curve is essential for developing effective feeding plans, optimizing breeding, and strategizing milk production for dairy farms. However, health disorders, as well as external factors such as heat stress, dietary changes, and certain management practices can cause perturbations (temporary drops in milk yield) that shift the fitted lactation curve downward, making it difficult to accurately estimate the potential lactation ability of dairy cows. This study aims to evaluate the applicability of unsupervised machine learning techniques for detecting outliers in daily milk yield data and estimating the expected lactation curve in the absence of perturbations, referred to as the unperturbed lactation curve (ULC). Using the Wood model as the baseline lactation curve, we compared ULC derived from 3 unsupervised machine learning models (UMLM), specifically one-class support vector machines, isolation forest, and local outlier factor, with those from 2 previously proposed models: the perturbed lactation model (PLM) and the iterative Wood model (IWM). We first conducted a simulation study using 1,000 simulated lactations over a 305-d period, each including 1 to 15 perturbations (mean ± SD: 4.00 ± 1.46), to assess perturbation detection performance. Across all UMLM, sensitivities (∼61%), precisions (∼82%), and their harmonic means (F scores, ∼70%) did not differ significantly. The UMLM outperformed the baseline Wood model in sensitivity (51.5%) and F score (64.2%) while maintaining comparable precision (83.8%). Their F scores also exceeded those of the PLM (53.2%) and IWM (66.8%), indicating more balanced curve adjustment and improved perturbation detection. We then applied the models to observed daily milk yield data from 2,831 lactation records of 1,636 Holstein cows collected over a 10-year period at the University of Wisconsin-Madison Agricultural Research Station. The comparison focused on the goodness-of-fit of ULC, computational efficiency, curve shape, and the validity of identified perturbations. The UMLM demonstrated relatively high computational efficiency in establishing the ULC, and these ULC showed better goodness-of-fit and shapes more consistent with the baseline Wood curve than the PLM and IWM. The upward shifts in the ULC from the UMLM were more conservative than those from the IWM and PLM, yet seemed reasonable based on previous reports on the impact of health disorders on milk yield. Additionally, these upward shifts by the UMLM may help identify potential perturbations that went undetected with the baseline Wood curve. In contrast, the PLM and IWM showed limitations in detecting potential perturbations, especially during early lactation. These findings suggest that unsupervised machine learning techniques can effectively detect potential outliers in daily milk yield data and adequately estimate the expected lactation curve in the absence of perturbations. However, the generalizability of the findings may be limited by the use of data from only Holstein cows at a single farm and the absence of health, environmental, and management records. Moreover, the current UMLM do not account for fixed effects (e.g., breed, parity, calving season) or long-term impacts of health disorders, which may hinder accurate lactation curve modeling. Future studies should consider incorporating more flexible modeling approaches and multifarm datasets with detailed background records.

摘要

泌乳曲线对于制定有效的饲养计划、优化繁殖以及为奶牛场制定产奶策略至关重要。然而,健康问题以及热应激、饮食变化和某些管理措施等外部因素会导致泌乳量暂时下降的扰动,使拟合的泌乳曲线向下偏移,从而难以准确估计奶牛的潜在泌乳能力。本研究旨在评估无监督机器学习技术在检测每日产奶量数据中的异常值以及估计无扰动情况下的预期泌乳曲线(即无扰动泌乳曲线,ULC)方面的适用性。以伍德模型作为基线泌乳曲线,我们将来自3种无监督机器学习模型(UMLM)(具体为一类支持向量机、孤立森林和局部离群因子)得出的ULC与之前提出的2种模型(扰动泌乳模型,PLM;迭代伍德模型,IWM)得出的ULC进行了比较。我们首先进行了一项模拟研究,在305天的时间段内使用1000次模拟泌乳,每次模拟包括1至15次扰动(均值±标准差:4.00±1.46),以评估扰动检测性能。在所有UMLM中,灵敏度(约61%)、精度(约82%)及其调和均值(F分数,约70%)差异不显著。UMLM在灵敏度(51.5%)和F分数(64.2%)方面优于基线伍德模型,同时保持了相当的精度(83.8%)。它们的F分数也超过了PLM(53.2%)和IWM(66.8%),表明曲线调整更加平衡,扰动检测得到了改善。然后,我们将这些模型应用于威斯康星大学麦迪逊分校农业研究站在10年期间收集的1636头荷斯坦奶牛的2831条泌乳记录的每日实际产奶量数据。比较集中在ULC的拟合优度、计算效率、曲线形状以及所识别扰动的有效性上。UMLM在建立ULC方面表现出相对较高的计算效率,并且这些ULC显示出比PLM和IWM更好的拟合优度,形状也更符合基线伍德曲线。UMLM得出的ULC向上偏移比IWM和PLM更保守,但基于先前关于健康问题对产奶量影响的报告来看似乎是合理的。此外,UMLM的这些向上偏移可能有助于识别基线伍德曲线未检测到的潜在扰动。相比之下,PLM和IWM在检测潜在扰动方面存在局限性,尤其是在泌乳早期。这些发现表明,无监督机器学习技术可以有效地检测每日产奶量数据中的潜在异常值,并在无扰动情况下充分估计预期泌乳曲线。然而,这些发现的普遍性可能受到仅使用单个农场荷斯坦奶牛数据以及缺乏健康、环境和管理记录的限制。此外,当前的UMLM没有考虑固定效应(如品种、胎次、产犊季节)或健康问题的长期影响,这可能会阻碍准确的泌乳曲线建模。未来的研究应考虑采用更灵活的建模方法以及包含详细背景记录的多农场数据集。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验