Sun Jun, Zhou Xin, Wu Xiaohong, Zhang Xiaodong, Li Qinglin
Key Laboratory of Tobacco Biology & Processing, Ministry of Agriculture, Qingdao 266101, China; School of Electrical and Information Engineering of Jiangsu University, Zhenjiang 212013, China.
Key Laboratory of Tobacco Biology & Processing, Ministry of Agriculture, Qingdao 266101, China; School of Electrical and Information Engineering of Jiangsu University, Zhenjiang 212013, China.
Biochem Biophys Res Commun. 2016 Feb 26;471(1):226-32. doi: 10.1016/j.bbrc.2016.01.125. Epub 2016 Jan 22.
Fast identification of moisture content in tobacco plant leaves plays a key role in the tobacco cultivation industry and benefits the management of tobacco plant in the farm. In order to identify moisture content of tobacco plant leaves in a fast and nondestructive way, a method involving Mahalanobis distance coupled with Monte Carlo cross validation(MD-MCCV) was proposed to eliminate outlier sample in this study. The hyperspectral data of 200 tobacco plant leaf samples of 20 moisture gradients were obtained using FieldSpc(®) 3 spectrometer. Savitzky-Golay smoothing(SG), roughness penalty smoothing(RPS), kernel smoothing(KS) and median smoothing(MS) were used to preprocess the raw spectra. In addition, Mahalanobis distance(MD), Monte Carlo cross validation(MCCV) and Mahalanobis distance coupled to Monte Carlo cross validation(MD-MCCV) were applied to select the outlier sample of the raw spectrum and four smoothing preprocessing spectra. Successive projections algorithm (SPA) was used to extract the most influential wavelengths. Multiple Linear Regression (MLR) was applied to build the prediction models based on preprocessed spectra feature in characteristic wavelengths. The results showed that the preferably four prediction model were MD-MCCV-SG (Rp(2) = 0.8401 and RMSEP = 0.1355), MD-MCCV-RPS (Rp(2) = 0.8030 and RMSEP = 0.1274), MD-MCCV-KS (Rp(2) = 0.8117 and RMSEP = 0.1433), MD-MCCV-MS (Rp(2) = 0.9132 and RMSEP = 0.1162). MD-MCCV algorithm performed best among MD algorithm, MCCV algorithm and the method without sample pretreatment algorithm in the eliminating outlier sample from 20 different moisture gradients of tobacco plant leaves and MD-MCCV can be used to eliminate outlier sample in the spectral preprocessing.
快速识别烟草植株叶片中的水分含量在烟草种植行业中起着关键作用,有利于农场中烟草植株的管理。为了快速、无损地识别烟草植株叶片的水分含量,本研究提出了一种结合马氏距离和蒙特卡罗交叉验证(MD-MCCV)的方法来剔除异常样本。使用FieldSpc(®) 3光谱仪获取了200个具有20种水分梯度的烟草植株叶片样本的高光谱数据。采用Savitzky-Golay平滑(SG)、粗糙度惩罚平滑(RPS)、核平滑(KS)和中值平滑(MS)对原始光谱进行预处理。此外,应用马氏距离(MD)、蒙特卡罗交叉验证(MCCV)以及马氏距离与蒙特卡罗交叉验证相结合(MD-MCCV)来选择原始光谱和四种平滑预处理光谱的异常样本。采用连续投影算法(SPA)提取最具影响力的波长。应用多元线性回归(MLR)基于特征波长处的预处理光谱特征建立预测模型。结果表明,较好的四个预测模型分别为MD-MCCV-SG(Rp(2) = 0.8401,RMSEP = 0.1355)、MD-MCCV-RPS(Rp(2) = 0.8030,RMSEP = 0.1274)、MD-MCCV-KS(Rp(2) = 0.8117,RMSEP = 0.1433)、MD-MCCV-MS(Rp(2) = 0.9132,RMSEP = 0.1162)。在从20种不同水分梯度的烟草植株叶片中剔除异常样本方面,MD-MCCV算法在MD算法、MCCV算法和无样本预处理算法中表现最佳,且MD-MCCV可用于光谱预处理中剔除异常样本。