[结合Savitzky-Golay平滑的可见-近红外光谱模式识别应用于转基因甘蔗品种筛选]

[Vis-NIR spectroscopic pattern recognition combined with SG smoothing applied to breed screening of transgenic sugarcane].

作者信息

Liu Gui-Song, Guo Hao-Song, Pan Tao, Wang Ji-Hua, Cao Gan

出版信息

Guang Pu Xue Yu Guang Pu Fen Xi. 2014 Oct;34(10):2701-6.

Abstract

UNLABELLED

Based on Savitzky-Golay (SG) smoothing screening, principal component analysis (PCA) combined with separately supervised linear discriminant analysis (LDA) and unsupervised hierarchical clustering analysis (HCA) were used for non-destructive visible and near-infrared (Vis-NIR) detection for breed screening of transgenic sugarcane. A random and stability-dependent framework of calibration, prediction, and validation was proposed. A total of 456 samples of sugarcane leaves planting in the elongating stage were collected from the field, which was composed of 306 transgenic (positive) samples containing Bt and Bar gene and 150 non-transgenic (negative) samples. A total of 156 samples (negative 50 and positive 106) were randomly selected as the validation set; the remaining samples (negative 100 and positive 200, a total of 300 samples) were used as the modeling set, and then the modeling set was subdivided into calibration (negative 50 and positive 100, a total of 150 samples) and prediction sets (negative 50 and positive 100, a total of 150 samples) for 50 times. The number of SG smoothing points was ex- panded, while some modes of higher derivative were removed because of small absolute value, and a total of 264 smoothing modes were used for screening. The pairwise combinations of first three principal components were used, and then the optimal combination of principal components was selected according to the model effect. Based on all divisions of calibration and prediction sets and all SG smoothing modes, the SG-PCA-LDA and SG-PCA-HCA models were established, the model parameters were optimized based on the average prediction effect for all divisions to produce modeling stability. Finally, the model validation was performed by validation set. With SG smoothing, the modeling accuracy and stability of PCA-LDA, PCA-HCA were signif- icantly improved. For the optimal SG-PCA-LDA model, the recognition rate of positive and negative validation samples were 94.3%, 96.0%; and were 92.5%, 98.0% for the optimal SG-PCA-LDA model, respectively.

CONCLUSION

Vis-NIR spectro- scopic pattern recognition combined with SG smoothing could be used for accurate recognition of transgenic sugarcane leaves, and provided a convenient screening method for transgenic sugarcane breeding.

摘要

未标记

基于Savitzky-Golay（SG）平滑筛选，主成分分析（PCA）结合单独监督线性判别分析（LDA）和无监督层次聚类分析（HCA）用于转基因甘蔗品种筛选的无损可见和近红外（Vis-NIR）检测。提出了一种随机且依赖稳定性的校准、预测和验证框架。从田间收集了456份处于伸长阶段的甘蔗叶片样本，其中包括306份含有Bt和Bar基因的转基因（阳性）样本以及150份非转基因（阴性）样本。随机选择156份样本（阴性50份和阳性106份）作为验证集；其余样本（阴性100份和阳性200份，共300份样本）用作建模集，然后将建模集细分为校准集（阴性50份和阳性100份，共150份样本）和预测集（阴性50份和阳性100份，共150份样本），重复50次。扩大了SG平滑点数，同时由于绝对值较小去除了一些高阶导数模式，共使用264种平滑模式进行筛选。使用前三个主成分的两两组合，然后根据模型效果选择主成分的最佳组合。基于校准集和预测集的所有划分以及所有SG平滑模式，建立了SG-PCA-LDA和SG-PCA-HCA模型，基于所有划分的平均预测效果对模型参数进行优化以产生建模稳定性。最后，通过验证集进行模型验证。通过SG平滑，PCA-LDA、PCA-HCA的建模准确性和稳定性显著提高。对于最佳SG-PCA-LDA模型，阳性和阴性验证样本的识别率分别为94.3%、96.0%；对于最佳SG-PCA-HCA模型，分别为92.5%、98.0%。