Tetko I V, Villa A E, Livingstone D J
Institute of Bioorganic and Petroleum Chemistry, Ukrainian Academy of Sciences, Kiev, Ukraine.
J Chem Inf Comput Sci. 1996 Jul-Aug;36(4):794-803. doi: 10.1021/ci950204c.
Quantitative structure-activity relationship (QSAR) studies usually require an estimation of the relevance of a very large set of initial variables. Determination of the most important variables allows theoretically a better generalization by all pattern recognition methods. This study introduces and investigates five pruning algorithms designed to estimate the importance of input variables in feed-forward artificial neural network trained by back propagation algorithm (ANN) applications and to prune nonrelevant ones in a statistically reliable way. The analyzed algorithms performed similar variable estimations for simulated data sets, but differences were detected for real QSAR examples. Improvement of ANN prediction ability was shown after the pruning of redundant input variables. The statistical coefficients computed by ANNs for QSAR examples were better than those of multiple linear regression. Restrictions of the proposed algorithms and the potential use of ANNs are discussed.
定量构效关系(QSAR)研究通常需要估计大量初始变量的相关性。确定最重要的变量理论上可以使所有模式识别方法实现更好的泛化。本研究介绍并研究了五种剪枝算法,这些算法旨在估计反向传播算法(ANN)应用中训练的前馈人工神经网络中输入变量的重要性,并以统计可靠的方式去除不相关的变量。所分析的算法对模拟数据集执行了类似的变量估计,但在实际QSAR示例中检测到了差异。去除冗余输入变量后,ANN的预测能力得到了提高。ANN为QSAR示例计算的统计系数优于多元线性回归的系数。讨论了所提出算法的局限性以及ANN的潜在用途。