Institute of Medicinal Biotechnology, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, People's Republic of China.
AAPS PharmSciTech. 2011 Jun;12(2):738-45. doi: 10.1208/s12249-011-9638-6. Epub 2011 Jun 4.
The purpose of this article is to propose an empirical solution to the problem of how many clusters of complex samples should be selected to construct the training set for a universal near infrared quantitative model based on the Naes method. The sample spectra were hierarchically classified into clusters by Ward's algorithm and Euclidean distance. If the sample spectra were classified into two clusters, the 1/50 of the largest Heterogeneity value in the cluster with larger variation was set as the threshold to determine the total number of clusters. One sample was then randomly selected from each cluster to construct the training set, and the number of samples in training set equaled the number of clusters. In this study, 98 batches of rifampicin capsules with API contents ranging from 50.1% to 99.4% were studied with this strategy. The root mean square errors of cross validation and prediction were 2.54% and 2.31% for the model for rifampicin capsules, respectively. Then, we evaluated this model in terms of outlier diagnostics, accuracy, precision, and robustness. We also used the strategy of training set sample selection to revalidate the models for cefradine capsules, roxithromycin tablets, and erythromycin ethylsuccinate tablets, and the results were satisfactory. In conclusion, all results showed that this training set sample selection strategy assisted in the quick and accurate construction of quantitative models using near-infrared spectroscopy.
本文旨在提出一种经验解决方案,以解决基于 Naes 方法构建通用近红外定量模型的训练集应选择多少个复杂样本簇的问题。采用 Ward 算法和欧几里得距离对样品光谱进行层次聚类。如果样品光谱分为两类,则将变化较大的类中最大异质性值的 1/50 设定为阈值,以确定总簇数。然后从每个簇中随机选择一个样品来构建训练集,训练集的样品数等于簇数。本研究采用该策略对 98 批 API 含量为 50.1%至 99.4%的利福平胶囊进行了研究。利福平胶囊模型的交叉验证和预测均方根误差分别为 2.54%和 2.31%。然后,我们从异常值诊断、准确性、精密度和稳健性方面评估了该模型。我们还使用训练集样品选择策略重新验证了头孢拉定胶囊、罗红霉素片和琥乙红霉素片的模型,结果令人满意。总之,所有结果均表明,该训练集样品选择策略有助于快速准确地构建近红外光谱定量模型。