Goyal Rakesh K, Singh G, Madan A K
Sharma University of Health Sciences, Rohtak, India.
Naturwissenschaften. 2011 Oct;98(10):871-87. doi: 10.1007/s00114-011-0839-3. Epub 2011 Sep 4.
An in silico approach comprising of decision tree (DT), random forest (RF) and moving average analysis (MAA) was successfully employed for development of models for prediction of anti-tumor activity of bisphosphonates. A dataset consisting of 65 analogues of both nitrogen-containing and non-nitrogen-containing bisphosphonates was selected for the present study. Four refinements of eccentric distance sum topochemical index termed as augmented eccentric distance sum topochemical indices 1-4 [formula: see text] have been proposed so as to significantly augment discriminating power. Proposed topological indices (TIs) along with the exiting TIs (>1,400) were subsequently utilized for development of models for prediction of anti-tumor activity of bisphosphonates. A total of 43 descriptors of diverse nature, from a large pool of molecular descriptors, calculated through E-Dragon software (version 1.0) and an in-house computer program were selected for development of suitable models by employing DT, RF and MAA. DT identified two TIs as most important and classified the analogues of the dataset with an accuracy of 97% in training set and 90.7% in tenfold cross-validated set. Random forest correctly classified the analogues with an accuracy of 89.2%. Four independent models developed through MAA predicted the activity of analogues of the dataset with an accuracy of 87.6% to 89%. The statistical significance of proposed models was assessed through intercorrelation analysis, specificity, sensitivity and Matthew's correlation coefficient. The proposed models offer a vast potential for providing lead structures for development of potent anti-tumor agents for treatment of cancer that has spread to the bone.
一种由决策树(DT)、随机森林(RF)和移动平均分析(MAA)组成的计算机模拟方法成功地用于开发双膦酸盐抗肿瘤活性预测模型。本研究选择了一个由65种含氮和不含氮双膦酸盐类似物组成的数据集。提出了偏心距离和拓扑化学指数的四种改进形式,称为增强偏心距离和拓扑化学指数1 - 4[公式:见原文],以显著增强判别能力。随后,将提出的拓扑指数(TIs)与现有的拓扑指数(>1,400)一起用于开发双膦酸盐抗肿瘤活性预测模型。通过E-Dragon软件(版本1.0)和一个内部计算机程序,从大量分子描述符中总共选择了43个不同性质的描述符,用于通过DT、RF和MAA开发合适的模型。DT确定了两个最重要的TIs,并对数据集中的类似物进行分类,训练集的准确率为97%,十折交叉验证集的准确率为90.7%。随机森林对类似物的正确分类准确率为89.2%。通过MAA开发的四个独立模型对数据集中类似物活性的预测准确率为87.6%至89%。通过相互关联分析、特异性、敏感性和马修斯相关系数评估了所提出模型的统计学意义。所提出的模型为开发用于治疗已扩散至骨骼的癌症的强效抗肿瘤药物提供先导结构具有巨大潜力。