通过两种结构表示方法对有机化合物水溶性进行建模的线性和非线性函数

Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods.

作者信息

Yan Aixia, Gasteiger Johann, Krug Michael, Anzali Soheila

机构信息

Computer-Chemie-Centrum and Institut für Organische Chemie, Universität Erlangen-Nürnberg, Germany.

出版信息

J Comput Aided Mol Des. 2004 Feb;18(2):75-87. doi: 10.1023/b:jcam.0000030031.81235.05.

DOI:10.1023/b:jcam.0000030031.81235.05

PMID:15287695

Abstract

Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's self-organizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.

摘要

通过使用多元线性回归分析和反向传播神经网络，基于包含2084种化合物的多样化数据集，开发了几种用于预测有机化合物水溶性的定量模型。这些化合物通过两种不同的结构表示方法进行描述：（1）使用18个拓扑描述符；（2）使用32个表示分子三维结构的径向分布函数代码和八个附加描述符。基于Kohonen自组织神经网络将数据集分为训练集和测试集。反向传播神经网络模型获得了良好的预测结果：使用18个拓扑描述符时，对于测试集中的936种化合物，相关系数为0.92，标准差为0.62；使用三维描述符时，对于测试集中的866种化合物，相关系数为0.90，标准差为0.73。还使用另一个数据集对模型进行了测试，并通过Kohonen自组织神经网络检查了两个数据集之间的关系。