Sousa Massaine Bandeira E, Filho Juraci Souza Sampaio, de Andrade Luciano Rogerio Braatz, de Oliveira Eder Jorge
Embrapa Mandioca e Fruticultura, Cruz das Almas, Bahia, Brazil.
Universidade Federal do Recôncavo da Bahia, Cruz das Almas, Bahia, Brazil.
Front Plant Sci. 2023 Jan 23;14:1089759. doi: 10.3389/fpls.2023.1089759. eCollection 2023.
Cassava ( Crantz) starch consists of amylopectin and amylose, with its properties determined by the proportion of these two polymers. Waxy starches contain at least 95% amylopectin. In the food industry, waxy starches are advantageous, with pastes that are more stable towards retrogradation, while high-amylose starches are used as resistant starches. This study aimed to associate near-infrared spectrophotometry (NIRS) spectra with the waxy phenotype in cassava seeds and develop an accurate classification model for indirect selection of plants. A total of 1127 F seeds were obtained from controlled crosses performed between 77 F genotypes (wild-type, _). Seeds were individually identified, and spectral data were obtained NIRS using a benchtop NIRFlex N-500 and a portable SCiO device spectrometer. Four classification models were assessed for waxy cassava genotype identification: k-nearest neighbor algorithm (KNN), C5.0 decision tree (CDT), parallel random forest (parRF), and eXtreme Gradient Boosting (XGB). Spectral data were divided between a training set (80%) and a testing set (20%). The accuracy, based on NIRFlex N-500 spectral data, ranged from 0.86 (parRF) to 0.92 (XGB). The Kappa index displayed a similar trend as the accuracy, considering the lowest value for the parRF method (0.39) and the highest value for XGB (0.71). For the SCiO device, the accuracy (0.88-0.89) was similar among the four models evaluated. However, the Kappa index was lower than that of the NIRFlex N-500, and this index ranged from 0 (parRF) to 0.16 (KNN and CDT). Therefore, despite the high accuracy these last models are incapable of correctly classifying waxy and non-waxy clones based on the SCiO device spectra. A confusion matrix was performed to demonstrate the classification model results in the testing set. For both NIRS, the models were efficient in classifying non-waxy clones, with values ranging from 96-100%. However, the NIRS differed in the potential to predict waxy genotype class. For the NIRFlex N-500, the percentage ranged from 30% (parRF) to 70% (XGB). In general, the models tended to classify waxy genotypes as non-waxy, mainly SCiO. Therefore, the use of NIRS can perform early selection of cassava seeds with a waxy phenotype.
木薯(Crantz)淀粉由支链淀粉和直链淀粉组成,其性质由这两种聚合物的比例决定。糯性淀粉至少含有95%的支链淀粉。在食品工业中,糯性淀粉具有优势,其糊化产物对回生更稳定,而高直链淀粉则用作抗性淀粉。本研究旨在将近红外分光光度法(NIRS)光谱与木薯种子的糯性表型相关联,并开发一种用于间接选择植株的准确分类模型。通过77个F基因型(野生型,_)之间进行的控制杂交获得了总共1127粒F种子。对种子进行单独识别,并使用台式NIRFlex N - 500和便携式SCiO设备光谱仪通过NIRS获取光谱数据。评估了四种用于识别木薯糯性基因型的分类模型:k近邻算法(KNN)、C5.0决策树(CDT)、并行随机森林(parRF)和极端梯度提升(XGB)。光谱数据被分为训练集(80%)和测试集(20%)。基于NIRFlex N - 500光谱数据的准确率在0.86(parRF)至0.92(XGB)之间。考虑到parRF方法的最低值(0.39)和XGB的最高值(0.71),卡帕指数呈现出与准确率相似的趋势。对于SCiO设备,在所评估的四个模型中准确率(0.88 - 0.89)相似。然而,卡帕指数低于NIRFlex N - 500,该指数范围从0(parRF)至0.16(KNN和CDT)。因此,尽管这些模型准确率较高,但基于SCiO设备光谱无法正确区分糯性和非糯性克隆。通过混淆矩阵展示了测试集中分类模型的结果。对于两种NIRS,模型在分类非糯性克隆方面效率较高,值范围为96 - 100%。然而,NIRS在预测糯性基因型类别方面的潜力有所不同。对于NIRFlex N - 500,百分比范围从30%(parRF)至70%(XGB)。总体而言,模型倾向于将糯性基因型分类为非糯性,主要是SCiO设备。因此,使用NIRS可以对具有糯性表型的木薯种子进行早期选择。