Key Laboratory of Drug Quality Control and Pharmacovigilance, Department of Analytical Chemistry, China Pharmaceutical University, Nanjing, 210009, China.
Spectrochim Acta A Mol Biomol Spectrosc. 2012 Apr;89:18-24. doi: 10.1016/j.saa.2011.12.006. Epub 2011 Dec 13.
Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.
大多数草药可以进行加工,以满足不同的治疗需求。本研究的目的是基于其近红外(NIR)光谱区分生地黄和熟地黄,这是一种常见的中药。最小二乘支持向量机(LS-SVM)和随机森林(RF)被用于全谱分类。检查了三种核函数,包括线性核、多项式核和径向基函数核(RBF),以优化 LS-SVM 模型。为了比较,还进行了线性判别分析(LDA)模型的分类,并在构建 LDA 模型之前执行了连续投影算法(SPA),以选择合适的波长子集。将这三种方法应用于包含 40 种生药和 40 种相应加工药的数据集。我们进行了 50 次 10 折交叉验证,以评估模型的效率。RBF 核(RBF LS-SVM)的 LS-SVM 性能优于其他两种核。RF、RBF LS-SVM 和 SPA-LDA 成功地对所有测试样本进行了分类。50 次 10 折交叉验证的平均错误率分别为 RBF LS-SVM 为 1.35%,RF 为 2.87%,SPA-LDA 为 2.50%。使用 RBF 核的 LS-SVM 获得了最佳的分类结果,而 RF 在训练和预测方面速度较快。