Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir, Srinagar, India.
Sci Rep. 2022 Nov 4;12(1):18726. doi: 10.1038/s41598-022-23499-w.
As the amount of data on farms grows, it is important to evaluate the potential of artificial intelligence for making farming predictions. Considering all this, this study was undertaken to evaluate various machine learning (ML) algorithms using 52-year data for sheep. Data preparation was done before analysis. Breeding values were estimated using Best Linear Unbiased Prediction. 12 ML algorithms were evaluated for their ability to predict the breeding values. The variance inflation factor for all features selected through principal component analysis (PCA) was 1. The correlation coefficients between true and predicted values for artificial neural networks, Bayesian ridge regression, classification and regression trees, gradient boosting algorithm, K nearest neighbours, multivariate adaptive regression splines (MARS) algorithm, polynomial regression, principal component regression (PCR), random forests, support vector machines, XGBoost algorithm were 0.852, 0.742, 0.869, 0.915, 0.781, 0.746, 0.742, 0.746, 0.917, 0.777, 0.915 respectively for breeding value prediction. Random forests had the highest correlation coefficients. Among the prediction equations generated using OLS, the highest coefficient of determination was 0.569. A total of 12 machine learning models were developed from the prediction of breeding values in sheep in the present study. It may be said that machine learning techniques can perform predictions with reasonable accuracies and can thus be viable alternatives to conventional strategies for breeding value prediction.
随着农场数据量的增加,评估人工智能在农业预测方面的潜力变得非常重要。考虑到这一切,本研究旨在评估使用 52 年绵羊数据的各种机器学习 (ML) 算法。在进行分析之前,进行了数据准备。使用最佳线性无偏预测 (Best Linear Unbiased Prediction) 估算了繁殖值。评估了 12 种 ML 算法预测繁殖值的能力。通过主成分分析 (PCA) 选择的所有特征的方差膨胀因子均为 1。人工神经网络、贝叶斯岭回归、分类和回归树、梯度提升算法、K 最近邻、多元自适应回归样条 (MARS) 算法、多项式回归、主成分回归 (PCR) 、随机森林、支持向量机、XGBoost 算法的真实值和预测值之间的相关系数分别为 0.852、0.742、0.869、0.915、0.781、0.746、0.742、0.746、0.917、0.777、0.915,用于繁殖值预测。随机森林的相关系数最高。在使用 OLS 生成的预测方程中,决定系数最高为 0.569。本研究共开发了 12 种用于绵羊繁殖值预测的机器学习模型。可以说,机器学习技术可以以合理的精度进行预测,因此可以成为传统繁殖值预测策略的可行替代方案。