Institute of Field and Vegetable Crops, Novi Sad, Serbia.
Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia.
Sci Rep. 2023 Oct 17;13(1):17611. doi: 10.1038/s41598-023-44999-3.
Due to the increased demand for sunflower production, its breeding assignment is the intensification of the development of highly productive oil seed hybrids to satisfy the edible oil industry. Sunflower Oil Yield Prediction (SOYP) can help breeders to identify desirable new hybrids with high oil yield and their characteristics using machine learning (ML) algorithms. In this study, we developed ML models to predict oil yield using two sets of features. Moreover, we evaluated the most relevant features for accurate SOYP. ML algorithms that were used and compared were Artificial Neural Network (ANN), Support Vector Regression, K-Nearest Neighbour, and Random Forest Regressor (RFR). The dataset consisted of samples for 1250 hybrids of which 70% were randomly selected and were used to train the model and 30% were used to test the model and assess its performance. Employing MAE, MSE, RMSE and R2 evaluation metrics, RFR consistently outperformed in all datasets, achieving a peak of 0.92 for R2 in 2019. In contrast, ANN recorded the lowest MAE, reaching 65 in 2018 The paper revealed that in addition to seed yield, the following characteristics of hybrids were important for SOYP: resistance to broomrape (Or) and downy mildew (Pl) and maturity. It was also disclosed that the locality feature could be used for the estimation of sunflower oil yield but it is highly dependable on weather conditions that affect the oil content and seed yield. Up to our knowledge, this is the first study in which ML was used for sunflower oil yield prediction. The obtained results indicate that ML has great potential for application in oil yield prediction, but also selection of parental lines for hybrid production, RFR algorithm was found to be the most effective and along with locality feature is going to be further evaluated as an alternative method for genotypic selection.
由于对向日葵产量的需求增加,其育种任务是加强高产油籽杂交种的开发,以满足食用油行业的需求。利用机器学习 (ML) 算法进行向日葵油产量预测 (SOYP) 可以帮助育种者识别具有高油产量和其特性的理想新杂交种。在本研究中,我们开发了使用两组特征来预测油产量的 ML 模型。此外,我们评估了最相关的特征以实现准确的 SOYP。使用和比较的 ML 算法包括人工神经网络 (ANN)、支持向量回归、K-最近邻和随机森林回归器 (RFR)。数据集由 1250 个杂交种的样本组成,其中 70%被随机选择用于训练模型,30%用于测试模型并评估其性能。使用 MAE、MSE、RMSE 和 R2 评估指标,RFR 在所有数据集上的表现都始终优于其他模型,在 2019 年达到了 R2 峰值 0.92。相比之下,ANN 的 MAE 最低,在 2018 年达到了 65。本文揭示了除了种子产量外,杂交种的以下特征对 SOYP 很重要:对帚状菌 (Or) 和霜霉病 (Pl) 的抗性和成熟度。还揭示了地理位置特征可用于估计向日葵油产量,但高度依赖于影响油含量和种子产量的天气条件。据我们所知,这是首次使用 ML 进行向日葵油产量预测的研究。研究结果表明,ML 在油产量预测方面具有很大的应用潜力,但也可用于杂交种生产的亲本系选择,发现 RFR 算法是最有效的,并且地理位置特征将作为替代基因型选择的方法进一步评估。