基于距离/连通性的高度相关拓扑指数5. 使用PCR和PC-ANN准确预测有机分子的液体密度。

Highly correlating distance/connectivity-based topological indices 5. Accurate prediction of liquid density of organic molecules using PCR and PC-ANN.

作者信息

Shamsipur Mojtaba, Ghavami Raouf, Sharghi Hashem, Hemmateenejad Bahram

机构信息

Department of Chemistry, Razi University, Kermanshah, Iran.

出版信息

J Mol Graph Model. 2008 Nov;27(4):506-11. doi: 10.1016/j.jmgm.2008.09.005. Epub 2008 Sep 13.

DOI:10.1016/j.jmgm.2008.09.005

PMID:18948045

Abstract

The primary goal of a quantitative structure-property relationship (QSPR) is to identify a set of structurally based numerical descriptors that can be mathematically linked to a property of interest. Recently, we proposed some new topological indices (Sh indices) based on the distance sum and connectivity of a molecular graph that derived directly from two-dimensional molecular topology for use in QSAR/QSPR studies. In this study, the ability of these indices to predict the liquid densities (rho) of a large and diverse set of organic liquid compounds (521 compounds) has been examined. Ten different Sh indices were calculated for each molecule. Both linear and non-linear modeling methods were implemented using principal component regression (PCR) and principal component-artificial neural network (PC-ANN) with back-propagation learning algorithm, respectively. Correlation ranking procedure was used to rank the principal components and entered them into the models. PCR analysis of the data showed that the proposed Sh indices could explain about 91.82% of variations in the density data, while the variations explained by the ANN modeling were more than 97.93%. The predictive ability of the models was evaluated using external test set molecules and root mean square errors of prediction of 0.0308 g ml(-1) and 0.0248 g ml(-1) were obtained for liquid densities of external compounds by linear and non-linear models, respectively.

摘要

定量结构-性质关系（QSPR）的主要目标是确定一组基于结构的数值描述符，这些描述符可以通过数学方法与感兴趣的性质联系起来。最近，我们基于分子图的距离和与连通性提出了一些新的拓扑指数（Sh指数），这些指数直接源自二维分子拓扑结构，用于定量构效关系/定量结构-性质关系（QSAR/QSPR）研究。在本研究中，考察了这些指数预测大量多样的有机液体化合物（521种化合物）液体密度（ρ）的能力。为每个分子计算了十种不同的Sh指数。分别使用主成分回归（PCR）和具有反向传播学习算法的主成分-人工神经网络（PC-ANN）实施线性和非线性建模方法。采用相关排序程序对主成分进行排序并将其纳入模型。对数据的PCR分析表明，所提出的Sh指数可以解释密度数据中约91.82%的变化，而人工神经网络建模所解释的变化超过97.93%。使用外部测试集分子评估模型的预测能力，线性和非线性模型对外部化合物液体密度的预测均方根误差分别为0.0308 g ml⁻¹和0.0248 g ml⁻¹。