Southeast University, Nanjing, China.
Chem Biol Drug Des. 2010 May;75(5):494-505. doi: 10.1111/j.1747-0285.2010.00958.x.
Considering the fact that majority of support vector regression models have not been fully optimized in the realm of quantitative structure-activity relationship, an idea of simultaneous optimization has been proposed and evaluated on a set of novel kinase insert domain receptor/vascular endothelial growth factor receptor-2 inhibitors including naphthalene and indazole-based compounds in this study. After the powerful feature searching process using genetic algorithm, the final support vector regression model was constructed on an optimal set of six descriptors, based on which simultaneous optimization was carried out. Specifically, the global optimum is grid searched in the joint parametric space defined by cost (C), gamma and epsilon, where performance of support vector regression using each combination of (C, gamma, epsilon) is evaluated and recorded, resulting in bulky information. Based on the data decomposition strategies provided in the main paper, the best performance was achieved for C = 1.2, gamma = 0.15 and epsilon = 0.065. As a comparison, a linear model based on genetic algorithm-multiple linear regression has also been developed and compared. Performances of these models are rigorously validated using both leave-one-out cross-validation and also external validation. The significant higher R(2) (0.908, 0.837) and lower root-mean-square error (0.237, 0.311) for 45 training and 16 test samples compared to that of genetic algorithm-multiple linear regression (0.764, 0.700 and 0.402, 0.421) confirm the superior performance of genetic algorithm-support vector regression. Robustness and predictive ability of this model is further prudently evaluated. The resulting models introduced not only the idea of simultaneous optimization in support vector regression, but also an efficient strategy for estimating the vascular endothelial growth factor receptor-2 inhibitory activity of novel naphthalene and indazole-based compounds. Moreover, some insights into the structural features related to the biological activity of these compounds have also been provided, which might be of great help for further designing novel vascular endothelial growth factor receptor-2/kinase insert domain receptor inhibitors with potent activity.
考虑到大多数支持向量回归模型在定量构效关系领域尚未得到充分优化的事实,本研究提出了一种同时优化的想法,并在一组新型激酶插入结构域受体/血管内皮生长因子受体-2 抑制剂(包括萘基和吲唑基化合物)上进行了评估。在使用遗传算法进行强大的特征搜索过程之后,基于一组最佳的六个描述符构建了最终的支持向量回归模型,并在此基础上进行了同时优化。具体来说,在由成本 (C)、γ 和 ε 定义的联合参数空间中进行全局最优网格搜索,其中使用每个 (C、γ、ε) 组合的支持向量回归性能进行评估和记录,导致信息量很大。基于主文中提供的数据分解策略,对于 C = 1.2、γ = 0.15 和 ε = 0.065 获得了最佳性能。作为比较,还开发并比较了基于遗传算法-多元线性回归的线性模型。使用留一交叉验证和外部验证严格验证了这些模型的性能。与遗传算法-多元线性回归(0.764、0.700 和 0.402、0.421)相比,对于 45 个训练和 16 个测试样本,该模型具有更高的 R²(0.908、0.837)和更低的均方根误差(0.237、0.311),这证实了遗传算法-支持向量回归的优越性能。还谨慎地评估了该模型的稳健性和预测能力。所提出的模型不仅引入了支持向量回归中的同时优化思想,还为估计新型萘基和吲唑基化合物的血管内皮生长因子受体-2 抑制活性提供了一种有效策略。此外,还提供了有关这些化合物生物活性相关结构特征的一些见解,这对于进一步设计具有强效活性的新型血管内皮生长因子受体-2/激酶插入结构域受体抑制剂可能非常有帮助。