Suppr超能文献

基于无监督学习方法的牛奶记录大数据能否为奶牛健康提供新的见解?

Can unsupervised learning methods applied to milk recording big data provide new insights into dairy cow health?

机构信息

University of Liège, Gembloux Agro-Bio Tech (ULiège-GxABT), 5030 Gembloux, Belgium.

Walloon Agricultural Research Center (CRA-W), 5030 Gembloux, Belgium.

出版信息

J Dairy Sci. 2022 Aug;105(8):6760-6772. doi: 10.3168/jds.2022-21975. Epub 2022 Jun 27.

Abstract

Among the dairy sector's current concerns, the assessment of global animal health status is a complex challenge. Its multidimensionality means that global monitoring tools are rarely considered. Instead, specific disease detection is often studied separately and, due to financial and ethical issues, uses small-scale data sets focusing on few biomarkers. Several studies have already been conducted using milk Fourier transform mid-infrared (FT-MIR) spectroscopy to detect mastitis and lameness or to quantify health-related biomarkers in milk or blood. Those studies are relevant but they focus mainly on one biomarker or disease. To solve this issue and the small-scale data set, in this study, we proposed a holistic approach using big data obtained from milk recording, including milk yield, somatic cell count, and 27 FT-MIR-based predictors related to milk composition and animal health status. Using 740,454 records collected from 114,536 first-parity Holstein cows in southern Belgium, we performed repeated unsupervised learning algorithms based on Ward's agglomerative hierarchical clustering method to find potential interesting patterns. A divide-and-conquer approach was used to overcome the limitation of computational resources in clustering a relatively large data set. Five groups of records were identified. Differences observed in the fourth group suggested a relationship to metabolic disorders. The fifth group seemed to be related to mastitis. In a second step, we performed a partial least squares discriminant analysis (PLS-DA) to predict the probability of belonging to those specific groups for the entire data set. The obtained global accuracy was 0.77 and the balanced accuracy (i.e., the mean between sensitivity and specificity) of discriminating the fourth and fifth groups was 0.88 and 0.96, respectively. Then, a validation of the interpretation of those groups was performed using 204 milk and blood reference records. The predicted probability associated with the metabolic disorders issue had significant correlations of 0.54 with blood β-hydroxybutyrate, 0.44 with blood nonesterified fatty acids, -0.32 with blood glucose, -0.23 with milk glucose-6-phosphate, and 0.38 with milk isocitrate. In contrast, the predicted probability of belonging to the mastitis group had correlations of 0.69 with milk lactate dehydrogenase, 0.46 with milk N-acetyl-β-d-glucosaminidase, -0.18 with milk free glucose, and 0.16 with milk glucose-6-phosphate. Consequently, these results suggest that the obtained quantitative traits indirectly reflect some of the main health disorders in dairy farming and could be used to monitor dairy cows on a large scale. By using unsupervised learning on large-scale milk recording data and then validating the pattern using reference laboratory measures, we propose a new approach to quickly assess dairy cow health status.

摘要

在乳制品行业当前关注的问题中,对全球动物健康状况的评估是一个复杂的挑战。其多维性意味着很少考虑使用全球监测工具。相反,通常分别研究特定疾病的检测,并且由于财务和道德问题,使用小型数据集集中在少数生物标志物上。已经使用牛奶傅里叶变换中红外(FT-MIR)光谱进行了几项研究,以检测乳腺炎和跛行或定量牛奶或血液中的与健康相关的生物标志物。这些研究是相关的,但主要集中在一个生物标志物或疾病上。为了解决这个问题和数据集较小的问题,在本研究中,我们提出了一种整体方法,使用从牛奶记录中获得的大数据,包括牛奶产量、体细胞计数以及与牛奶成分和动物健康状况相关的 27 个基于 FT-MIR 的预测因子。使用从比利时南部的 114,536 头首胎荷斯坦奶牛收集的 740,454 条记录,我们基于 Ward 的凝聚层次聚类方法进行了重复的无监督学习算法,以找到潜在的有趣模式。采用分而治之的方法克服了在相对较大的数据集中聚类的计算资源限制。确定了五个记录组。在第四组中观察到的差异表明与代谢紊乱有关。第五组似乎与乳腺炎有关。在第二步中,我们对整个数据集进行了部分最小二乘判别分析(PLS-DA),以预测属于这些特定组的概率。获得的全局准确性为 0.77,第四组和第五组的平衡准确性(即灵敏度和特异性的平均值)分别为 0.88 和 0.96。然后,使用 204 个牛奶和血液参考记录对这些组的解释进行了验证。与代谢紊乱问题相关的预测概率与血液β-羟丁酸显著相关,相关系数为 0.54,与血液非酯化脂肪酸相关系数为 0.44,与血糖相关系数为-0.32,与牛奶葡萄糖-6-磷酸相关系数为-0.23,与牛奶异柠檬酸相关系数为 0.38。相比之下,属于乳腺炎组的预测概率与牛奶乳酸脱氢酶相关系数为 0.69,与牛奶 N-乙酰-β-D-氨基葡萄糖苷酶相关系数为 0.46,与牛奶游离葡萄糖相关系数为-0.18,与牛奶葡萄糖-6-磷酸相关系数为 0.16。因此,这些结果表明,获得的定量特征间接反映了奶牛养殖中的一些主要健康障碍,可以用于大规模监测奶牛。通过对大规模牛奶记录数据进行无监督学习,然后使用参考实验室测量值验证模式,我们提出了一种快速评估奶牛健康状况的新方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验