Nansen Christian, Imtiaz Mohammad S, Mesgaran Mohsen B, Lee Hyoseok
Department of Entomology and Nematology, University of California, Davis, USA.
Department of Entomology and Nematology, UC Davis Briggs Hall, Room 367, Davis, CA, 95616, USA.
Plant Methods. 2022 Jun 3;18(1):74. doi: 10.1186/s13007-022-00912-z.
Optical sensing solutions are being developed and adopted to classify a wide range of biological objects, including crop seeds. Performance assessment of optical classification models remains both a priority and a challenge.
As training data, we acquired hyperspectral imaging data from 3646 individual tomato seeds (germination yes/no) from two tomato varieties. We performed three experimental data manipulations: (1) Object assignment error: effect of individual object in the training data being assigned to the wrong class. (2) Spectral repeatability: effect of introducing known ranges (0-10%) of stochastic noise to individual reflectance values. (3) Size of training data set: effect of reducing numbers of observations in training data. Effects of each of these experimental data manipulations were characterized and quantified based on classifications with two functions [linear discriminant analysis (LDA) and support vector machine (SVM)].
For both classification functions, accuracy decreased linearly in response to introduction of object assignment error and to experimental reduction of spectral repeatability. We also demonstrated that experimental reduction of training data by 20% had negligible effect on classification accuracy. LDA and SVM classification algorithms were applied to independent validation seed samples. LDA-based classifications predicted seed germination with RMSE = 10.56 (variety 1) and 26.15 (variety 2), and SVM-based classifications predicted seed germination with RMSE = 10.44 (variety 1) and 12.58 (variety 2).
We believe this study represents the first, in which optical seed classification included both a thorough performance evaluation of two separate classification functions based on experimental data manipulations, and application of classification models to validation seed samples not included in training data. Proposed experimental data manipulations are discussed in broader contexts and general relevance, and they are suggested as methods for in-depth performance assessments of optical classification models.
光学传感解决方案正在被开发和应用于对包括作物种子在内的多种生物对象进行分类。光学分类模型的性能评估仍然是一个优先事项和挑战。
作为训练数据,我们从两个番茄品种的3646颗单个番茄种子(发芽与否)中获取了高光谱成像数据。我们进行了三种实验数据处理:(1)对象分配错误:训练数据中单个对象被分配到错误类别的影响。(2)光谱重复性:向单个反射率值引入已知范围(0 - 10%)的随机噪声的影响。(3)训练数据集大小:减少训练数据中观测值数量的影响。基于两种函数[线性判别分析(LDA)和支持向量机(SVM)]的分类,对这些实验数据处理中的每一种的影响进行了表征和量化。
对于这两种分类函数,随着对象分配错误的引入以及光谱重复性的实验性降低,准确率呈线性下降。我们还证明,将训练数据减少20%对分类准确率的影响可忽略不计。LDA和SVM分类算法被应用于独立的验证种子样本。基于LDA的分类预测种子发芽的均方根误差(RMSE)为10.56(品种1)和26.15(品种2),基于SVM的分类预测种子发芽的RMSE为10.44(品种1)和12.58(品种2)。
我们认为这项研究是首次将光学种子分类既包括基于实验数据处理对两种单独分类函数进行全面性能评估,又包括将分类模型应用于未包含在训练数据中的验证种子样本。所提出的实验数据处理在更广泛的背景和普遍相关性中进行了讨论,并被建议作为光学分类模型深入性能评估的方法。