School of Chemistry and Chemical Engineering of Sun Yat-sen University, Guanzhou 510275, People's Republic of China.
J Comput Chem. 2010 Jul 30;31(10):1956-68. doi: 10.1002/jcc.21471.
Based on the quantitative structure-activity relationships (QSARs) models developed by artificial neural networks (ANNs), genetic algorithm (GA) was used in the variable-selection approach with molecule descriptors and helped to improve the back-propagation training algorithm as well. The cross validation techniques of leave-one-out investigated the validity of the generated ANN model and preferable variable combinations derived in the GAs. A self-adaptive GA-ANN model was successfully established by using a new estimate function for avoiding over-fitting phenomenon in ANN training. Compared with the variables selected in two recent QSAR studies that were based on stepwise multiple linear regression (MLR) models, the variables selected in self-adaptive GA-ANN model are superior in constructing ANN model, as they revealed a higher cross validation (CV) coefficient (Q(2)) and a lower root mean square deviation both in the established model and biological activity prediction. The introduced methods for validation, including leave-multiple-out, Y-randomization, and external validation, proved the superiority of the established GA-ANN models over MLR models in both stability and predictive power. Self-adaptive GA-ANN showed us a prospect of improving QSAR model.
基于人工神经网络 (ANN) 开发的定量构效关系 (QSAR) 模型,遗传算法 (GA) 被用于具有分子描述符的变量选择方法,并帮助改进了反向传播训练算法。留一法交叉验证技术调查了所生成的 ANN 模型的有效性和 GA 中得出的优选变量组合。通过使用新的估计函数来避免 ANN 训练中的过拟合现象,成功建立了自适应 GA-ANN 模型。与基于逐步多元线性回归 (MLR) 模型的两项最近的 QSAR 研究中选择的变量相比,自适应 GA-ANN 模型中选择的变量在构建 ANN 模型方面更具优势,因为它们在建立的模型和生物活性预测中均显示出更高的交叉验证 (CV) 系数 (Q(2)) 和更低的均方根偏差。所介绍的验证方法,包括留多次、Y 随机化和外部验证,证明了自适应 GA-ANN 模型在稳定性和预测能力方面均优于 MLR 模型。自适应 GA-ANN 为我们展示了改进 QSAR 模型的前景。