Yan Aixia, Gasteiger Johann, Krug Michael, Anzali Soheila
Computer-Chemie-Centrum and Institut für Organische Chemie, Universität Erlangen-Nürnberg, Germany.
J Comput Aided Mol Des. 2004 Feb;18(2):75-87. doi: 10.1023/b:jcam.0000030031.81235.05.
Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's self-organizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.
通过使用多元线性回归分析和反向传播神经网络,基于包含2084种化合物的多样化数据集,开发了几种用于预测有机化合物水溶性的定量模型。这些化合物通过两种不同的结构表示方法进行描述:(1)使用18个拓扑描述符;(2)使用32个表示分子三维结构的径向分布函数代码和八个附加描述符。基于Kohonen自组织神经网络将数据集分为训练集和测试集。反向传播神经网络模型获得了良好的预测结果:使用18个拓扑描述符时,对于测试集中的936种化合物,相关系数为0.92,标准差为0.62;使用三维描述符时,对于测试集中的866种化合物,相关系数为0.90,标准差为0.73。还使用另一个数据集对模型进行了测试,并通过Kohonen自组织神经网络检查了两个数据集之间的关系。