Kiouvrekis Yiannis, Vasileiou Natalia G C, Katsarou Eleni I, Lianou Daphne T, Michael Charalambia K, Zikas Sotiris, Katsafadou Angeliki I, Bourganou Maria V, Liagka Dimitra V, Chatzopoulos Dimitris C, Fthenakis George C
Faculty of Public and One Health, University of Thessaly, 43100 Karditsa, Greece.
School of Business, University of Nicosia, Nicosia 2417, Cyprus.
Animals (Basel). 2024 Aug 6;14(16):2295. doi: 10.3390/ani14162295.
The objective of the study was to develop a computational model with which predictions regarding the level of prevalence of mastitis in dairy sheep farms could be performed. Data for the construction of the model were obtained from a large Greece-wide field study with 111 farms. Unsupervised learning methodology was applied for clustering data into two clusters based on 18 variables (17 independent variables related to health management practices applied in farms, climatological data at the locations of the farms, and the level of prevalence of subclinical mastitis as the target value). The K-means tool showed the highest significance for the classification of farms into two clusters for the construction of the computational model: median (interquartile range) prevalence of subclinical mastitis among farms was 20.0% (interquartile range: 15.8%) and 30.0% (16.0%) ( = 0.002). Supervised learning tools were subsequently used to predict the level of prevalence of the infection: decision trees, k-NN, neural networks, and Support vector machines. For each of these, combinations of hyperparameters were employed; 83 models were produced, and 4150 assessments were made in total. A computational model obtained by means of Support vector machines (kernel: '', regularization parameter C = 3) was selected. Thereafter, the model was assessed through the results of the prevalence of subclinical mastitis in 373 records from sheep flocks unrelated to the ones employed for the selection of the model; the model was used for evaluation of the correct classification of the data in each of 373 sets, each of which included a test (prediction) subset with one record that referred to the farm under assessment. The median prevalence of the infection in farms classified by the model in each of the two categories was 10.4% (5.5%) and 36.3% (9.7%) ( < 0.0001). The overall accuracy of the model for the results presented by the K-means tool was 94.1%; for the estimation of the level of prevalence (<25.0%/≥25.0%) in the farms, it was 96.3%. The findings of this study indicate that machine learning algorithms can be usefully employed in predicting the level of subclinical mastitis in dairy sheep farms. This can facilitate setting up appropriate health management measures for interventions in the farms.
本研究的目的是开发一种计算模型,利用该模型可以对奶羊场乳腺炎的流行水平进行预测。构建模型的数据来自一项在希腊全国范围内针对111个农场开展的大型实地研究。采用无监督学习方法,基于18个变量(17个与农场应用的健康管理措施相关的自变量、农场所在地的气候数据以及亚临床乳腺炎的流行水平作为目标值)将数据聚类为两个集群。K均值工具在将农场分类为两个集群以构建计算模型方面显示出最高的显著性:农场中亚临床乳腺炎的中位数(四分位间距)流行率分别为20.0%(四分位间距:15.8%)和30.0%(16.0%)(P = 0.002)。随后使用监督学习工具预测感染的流行水平:决策树、k近邻算法、神经网络和支持向量机。对于每种工具,都采用了超参数组合;共生成了83个模型,总共进行了4150次评估。选择了通过支持向量机(核函数:'',正则化参数C = 3)获得的计算模型。此后,通过与用于模型选择的羊群无关的373条记录中亚临床乳腺炎的流行结果对该模型进行评估;该模型用于评估373组数据中每组数据的正确分类情况,每组数据都包含一个测试(预测)子集,其中有一条记录涉及被评估的农场。该模型在两个类别中分类的农场中感染的中位数流行率分别为10.4%(5.5%)和36.3%(9.7%)(P < 0.0001)。该模型对于K均值工具呈现结果的总体准确率为94.1%;对于农场中流行水平(<25.0%/≥25.0%)的估计,准确率为96.3%。本研究结果表明,机器学习算法可有效地用于预测奶羊场亚临床乳腺炎的水平。这有助于制定适当的健康管理措施以对农场进行干预。