Mahdi Wael A, Alhowyan Adel, Obaidullah Ahmad J
Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh, 11451, Saudi Arabia.
Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh, 11451, Saudi Arabia.
Sci Rep. 2025 Jul 11;15(1):25139. doi: 10.1038/s41598-025-10417-z.
In this research, advanced regression techniques are investigated for modeling intricate release patterns utilizing a high-dimensional dataset comprising more than 1500 spectrum-based variables and categorical inputs. The spectral data are collected from Raman spectroscopy for analysis of drug release from a solid dosage formulation coated with Polysaccharides (a high-dimensional dataset of 155 samples, with drug release measured at 2, 8, and 24 h). The considered drug is 5-aminosalicylic acid for colonic drug delivery, and its release was estimated using Raman data as inputs along with other categorical parameters. The models, including Kernel Ridge Regression (KRR), Kernel-based Extreme Learning Machine (K-ELM), and Quantile Regression (QR) incorporate sophisticated approaches like the Sailfish Optimizer (SFO) for hyperparameter optimization and K-fold cross-validation to enhance predictive accuracy. Notably, KRR exhibited exceptional performance, achieving an R² of 0.997 on the training set and 0.992 on the test set, with a mean squared error (MSE) of 0.0004. In comparison, K-ELM and QR achieved lower R² values of 0.923 and 0.817 on the test set, respectively. The key innovation lies in integrating these non-linear regression models with robust data preprocessing steps, including dimensionality reduction via Principal Component Analysis (PCA), categorical feature encoding through Leave-One-Out (LOO), and outlier detection using Isolation Forest. This study significantly contributes by offering a comprehensive framework for managing high-dimensional and heterogeneous datasets, while emphasizing the effectiveness of optimization strategies in predictive modeling. By accurately predicting the release of 5-ASA from polysaccharide-coated formulations, these models can aid in the design of targeted colonic delivery formulations with optimized release kinetics, ultimately enhancing the efficacy of treatments for colonic diseases.
在本研究中,我们研究了先进的回归技术,以利用包含1500多个基于光谱的变量和分类输入的高维数据集对复杂的释放模式进行建模。光谱数据是从拉曼光谱中收集的,用于分析涂有多糖的固体剂型的药物释放(一个包含155个样本的高维数据集,在2、8和24小时测量药物释放)。所考虑的药物是用于结肠给药的5-氨基水杨酸,其释放量是使用拉曼数据作为输入以及其他分类参数进行估计的。这些模型,包括核岭回归(KRR)、基于核的极限学习机(K-ELM)和分位数回归(QR),采用了复杂的方法,如用于超参数优化的旗鱼优化器(SFO)和K折交叉验证,以提高预测准确性。值得注意的是,KRR表现出卓越的性能,在训练集上的R²为0.997,在测试集上为0.992,均方误差(MSE)为0.0004。相比之下,K-ELM和QR在测试集上的R²值分别为0.923和0.817。关键创新在于将这些非线性回归模型与强大的数据预处理步骤相结合,包括通过主成分分析(PCA)进行降维、通过留一法(LOO)进行分类特征编码以及使用孤立森林进行异常值检测。这项研究通过提供一个管理高维和异构数据集的综合框架做出了重大贡献,同时强调了优化策略在预测建模中的有效性。通过准确预测多糖包衣制剂中5-ASA的释放,这些模型可以帮助设计具有优化释放动力学的靶向结肠给药制剂,最终提高结肠疾病治疗的疗效。