Jin Zhao-xi, Zhang Xiu-juan, Luo Fu-yi, An Dong, Zhao Sheng-yi, Ran Hang, Yan Yan-lu
Guang Pu Xue Yu Guang Pu Fen Xi. 2016 Dec;36(12):3920-5.
For more wheat varieties classification problem, we use near infrared spectrumto do qualitative analysis. Increasing the size of modeling sample could increase information of the model, however, at the same time, it also makes information redundancy so that modeling time and storage space will increase, thus, we need to decrease the size of modeling sample though selecting them. Some information must be lost and the effects of the model must be worse if we select samples blindly. We put forward the k nearest neighbor-density sample selection based on the traditional selection methods. Experiments use the near infrared diffuse reflection spectrum of wheat seed from lots of days. First, we use preprocessing and feature extraction to deal with the wheat original spectrum, then select modeling sample by three methods that are random sampling, k nearest neighbor and k nearest neighbor-density, finally, we establish the models of BPR(Biomimetic Pattern Recognition) and BPRI(Biomimetic Pattern Recognition Improved). The experimental results show that in the model of BPR we get the best results using the selection method of k nearest neighbor-density, especially it also decreases the size of modeling sample deeply, and in the model of BPRI the results using the selection method of k nearest neighbor-density are much better than random sampling and a little better than k nearest neighbor, but in the meanwhile the size of modeling sample using the selection method of k nearest neighbor-density are much smaller than k nearest neighbor. The experimental results prove that the sample selection method of k nearest neighbor-density can not only greatly reduce the modeling sample size, and ensure the quality of the model, it has obvious effect on varieties classification problem of wheat.
针对更多小麦品种分类问题,我们使用近红外光谱进行定性分析。增加建模样本大小可以增加模型的信息量,然而,与此同时,这也会导致信息冗余,从而使建模时间和存储空间增加,因此,我们需要通过筛选来减小建模样本的大小。如果盲目选择样本,必然会丢失一些信息,模型效果也会变差。我们在传统选择方法的基础上提出了k近邻密度样本选择方法。实验采用多日小麦种子的近红外漫反射光谱。首先,对小麦原始光谱进行预处理和特征提取,然后通过随机抽样、k近邻和k近邻密度三种方法选择建模样本,最后建立仿生模式识别(BPR)和改进的仿生模式识别(BPRI)模型。实验结果表明,在BPR模型中,使用k近邻密度选择方法得到的结果最佳,尤其它还能大幅减小建模样本大小,而在BPRI模型中,使用k近邻密度选择方法得到的结果比随机抽样好得多,比k近邻略好,但同时使用k近邻密度选择方法的建模样本大小比k近邻小得多。实验结果证明,k近邻密度样本选择方法不仅能大幅减小建模样本大小,还能保证模型质量,对小麦品种分类问题有明显效果。