Rorije E, Van Wezel M C, Peijnenburg W J
Laboratory for Ecotoxicology, National Institute of Public Health and Environmental Protection, Bilthoven, The Netherlands.
SAR QSAR Environ Res. 1995;4(4):219-35. doi: 10.1080/10629369508032982.
In this study a systematic analysis of the predictive capabilities of models built with backpropagation neural networks (BPNN) is made to corroborate the hypothesis that BPNN is capable of modeling the interaction terms in group contribution models, without explicitly adding these as descriptors. The data used for comparison are reactivities of 275 organic compounds towards the atomospheric OH-radical. This dataset was selected because of the internal consistency, reliability and relatively large size of this dataset. While training the network, the minimal Mean Squared Error (MSE) on a test set was used as the stop criterion. This avoids overfitting on the training data, and is most likely to give the best generalizing network. A network trained with a designed training and test set is compared with networks trained on randomly constructed training and test sets. The BPNN model based on designed training and test set not only gives the best model, but also the best predictability on an external validation set, compared both to linear models built with the same training and validation sets, and BPNN models based on randomly constructed training and test sets. The performance of the designed BPNN model is comparable to an existing model which includes interaction terms.
在本研究中,对使用反向传播神经网络(BPNN)构建的模型的预测能力进行了系统分析,以证实以下假设:BPNN能够在不明确将相互作用项作为描述符添加的情况下,对基团贡献模型中的相互作用项进行建模。用于比较的数据是275种有机化合物与大气中OH自由基的反应活性。选择该数据集是因为其内部一致性、可靠性以及相对较大的规模。在训练网络时,将测试集上的最小均方误差(MSE)用作停止标准。这避免了对训练数据的过度拟合,并且最有可能给出具有最佳泛化能力的网络。将使用设计好的训练集和测试集训练的网络与在随机构建的训练集和测试集上训练的网络进行比较。与使用相同训练集和验证集构建的线性模型以及基于随机构建的训练集和测试集的BPNN模型相比,基于设计好的训练集和测试集的BPNN模型不仅给出了最佳模型,而且在外部验证集上具有最佳的预测能力。设计好的BPNN模型的性能与一个包含相互作用项的现有模型相当。