Center for Analytical Chemistry, Institute for Agrobiotechnology (IFA-Tulln), Konrad Lorenz Straße 20, A-3430, Tulln, Austria.
Mycotoxin Res. 2003 Jun;19(2):149-53. doi: 10.1007/BF02942955.
Validation methods for chemometric models are presented, which are a necessity for the evaluation of model performance and prediction ability. Reference methods with known performance can be employed for comparison studies. Other validation methods include test set and cross validation, where some samples are set aside for testing purposes. The choice of the testing method mainly depends on the size of the original dataset. Test set validation is suitable for large datasets (>50), whereas cross validation is the best method for medium to small datasets (<50). In this study the K-nearest neighbour algorithm (KNN) was used as a reference method for the classification of contaminated and blank corn samples. A Partial least squares (PLS) regression model was evaluated using full cross validation. Mid-Infrared spectra were collected using the attenuated total reflection (ATR) technique and the fingerprint range (800-1800 cm(-1)) of 21 maize samples that were contaminated with 300 - 2600 µg/kg deoxynivalenol (DON) was investigated. Separation efficiency after principal component analysis/cluster analysis (PCA/CA) classification was 100%. Cross validation of the PLS model revealed a correlation coefficient of r=0.9926 with a root mean square error of calibration (RMSEC) of 95.01. Validation results gave an r=0.8111 and a root mean square error of cross validation (RMSECV) of 494.5 was calculated. No outliers were reported.
本文介绍了化学计量学模型的验证方法,这是评估模型性能和预测能力的必要条件。具有已知性能的参考方法可用于比较研究。其他验证方法包括测试集和交叉验证,其中一些样本被留出用于测试目的。测试方法的选择主要取决于原始数据集的大小。测试集验证适用于大型数据集(>50),而交叉验证是中/小型数据集(<50)的最佳方法。在本研究中,K-最近邻算法(KNN)被用作污染和空白玉米样品分类的参考方法。使用全交叉验证评估偏最小二乘(PLS)回归模型。使用衰减全反射(ATR)技术采集中红外光谱,研究了 21 个受脱氧雪腐镰刀菌烯醇(DON)污染的玉米样品的指纹范围(800-1800 cm(-1)),污染水平为 300-2600 µg/kg。主成分分析/聚类分析(PCA/CA)分类后的分离效率为 100%。PLS 模型的交叉验证得到了 r=0.9926 的相关系数和 95.01 的校准均方根误差(RMSEC)。验证结果计算出 r=0.8111 和 RMSECV 的均方根误差为 494.5。没有报告异常值。