Lippolis Antonio, Polo Pamela Vega, de Sousa Guilherme, Dechesne Annemarie, Pouvreau Laurice, Trindade Luisa M
Plant Breeding, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands.
Wageningen Food & Biobased Research, Wageningen University & Research, Bornse Weilanden 9, 6708WG, Wageningen, the Netherlands.
Food Chem X. 2024 Jun 26;23:101583. doi: 10.1016/j.fochx.2024.101583. eCollection 2024 Oct 30.
Near-infrared spectroscopy (NIRS) provides a high-throughput phenotyping technique to assist breeding for improved faba bean seed quality. We combined chemical analysis of protein, oil content (and composition) with NIRS through chemometrics, employing Partial Least Squares (PLS), Elastic Net (EN), Memory-based Learning (MBL), and Bayes B (BB) as prediction models. Protein was the most reliably predicted trait (R = 0.96-0.98) across field trials, followed by oil (R = 0.82-0.86) and oleic acid (R = 0.31-0.68). Samples for training the models were selected using K-means clustering. The optimal statistical approach for prediction was compound-specific: PLS for protein (Root Mean Squared Error - RMSE = 0.46), BB for oil (RMSE = 0.067), and EN for oleic acid content (RMSE = 2.83). Reduced training set simulations revealed different effects on prediction accuracy depending on the model and compound. Several NIR regions were pinpointed as highly informative for the compounds, using the shrinkage and variable selection capabilities of EN and BB.
近红外光谱(NIRS)提供了一种高通量表型分析技术,以辅助蚕豆种子品质改良育种。我们通过化学计量学将蛋白质、油含量(及组成)的化学分析与近红外光谱相结合,采用偏最小二乘法(PLS)、弹性网络(EN)、基于记忆的学习(MBL)和贝叶斯B(BB)作为预测模型。在田间试验中,蛋白质是预测最可靠的性状(R = 0.96 - 0.98),其次是油(R = 0.82 - 0.86)和油酸(R = 0.31 - 0.68)。使用K均值聚类选择用于训练模型的样本。预测的最佳统计方法是针对特定化合物的:蛋白质采用PLS(均方根误差 - RMSE = 0.46),油采用BB(RMSE = 0.067),油酸含量采用EN(RMSE = 2.83)。简化训练集模拟显示,根据模型和化合物的不同,对预测准确性有不同影响。利用EN和BB的收缩和变量选择能力,确定了几个对这些化合物具有高度信息性的近红外区域。