Suppr超能文献

探索新型N元组离散导数指标在基准数据集上的定量构效关系(QSAR)预测真实性。

Exploring the QSAR's predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets.

作者信息

Martínez-Santiago O, Marrero-Ponce Y, Vivas-Reyes R, Rivera-Borroto O M, Hurtado E, Treto-Suarez M A, Ramos Y, Vergara-Murillo F, Orozco-Ugarriza M E, Martínez-López Y

机构信息

a Department of Chemical Sciences , Central University 'Martha Abreu' of Las Villas , Santa Clara , Cuba.

b Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN) , Quito , Ecuador.

出版信息

SAR QSAR Environ Res. 2017 May;28(5):367-389. doi: 10.1080/1062936X.2017.1326403.

Abstract

Graph derivative indices (GDIs) have recently been defined over N-atoms (N = 2, 3 and 4) simultaneously, which are based on the concept of derivatives in discrete mathematics (finite difference), metaphorical to the derivative concept in classical mathematical analysis. These molecular descriptors (MDs) codify topo-chemical and topo-structural information based on the concept of the derivative of a molecular graph with respect to a given event (S) over duplex, triplex and quadruplex relations of atoms (vertices). These GDIs have been successfully applied in the description of physicochemical properties like reactivity, solubility and chemical shift, among others, and in several comparative quantitative structure activity/property relationship (QSAR/QSPR) studies. Although satisfactory results have been obtained in previous modelling studies with the aforementioned indices, it is necessary to develop new, more rigorous analysis to assess the true predictive performance of the novel structure codification. So, in the present paper, an assessment and statistical validation of the performance of these novel approaches in QSAR studies are executed, as well as a comparison with those of other QSAR procedures reported in the literature. To achieve the main aim of this research, QSARs were developed on eight chemical datasets widely used as benchmarks in the evaluation/validation of several QSAR methods and/or many different MDs (fundamentally 3D MDs). Three to seven variable QSAR models were built for each chemical dataset, according to the original dissection into training/test sets. The models were developed by using multiple linear regression (MLR) coupled with a genetic algorithm as the feature wrapper selection technique in the MobyDigs software. Each family of GDIs (for duplex, triplex and quadruplex) behaves similarly in all modelling, although there were some exceptions. However, when all families were used in combination, the results achieved were quantitatively higher than those reported by other authors in similar experiments. Comparisons with respect to external correlation coefficients (q) revealed that the models based on GDIs possess superior predictive ability in seven of the eight datasets analysed, outperforming methodologies based on similar or more complex techniques and confirming the good predictive power of the obtained models. For the q values, the non-parametric comparison revealed significantly different results to those reported so far, which demonstrated that the models based on DIVATI's indices presented the best global performance and yielded significantly better predictions than the 12 0-3D QSAR procedures used in the comparison. Therefore, GDIs are suitable for structure codification of the molecules and constitute a good alternative to build QSARs for the prediction of physicochemical, biological and environmental endpoints.

摘要

图导数指数(GDIs)最近已在N个原子(N = 2、3和4)上同时定义,它们基于离散数学(有限差分)中的导数概念,类似于经典数学分析中的导数概念。这些分子描述符(MDs)基于分子图相对于给定事件(S)在原子(顶点)的双链、三链和四链关系上的导数概念,对拓扑化学和拓扑结构信息进行编码。这些GDIs已成功应用于描述诸如反应性、溶解度和化学位移等物理化学性质,以及在一些比较定量结构活性/性质关系(QSAR/QSPR)研究中。尽管在先前使用上述指数的建模研究中取得了令人满意的结果,但有必要开展新的、更严格的分析,以评估这种新型结构编码的真实预测性能。因此,在本文中,对这些新方法在QSAR研究中的性能进行了评估和统计验证,并与文献中报道的其他QSAR程序进行了比较。为了实现本研究的主要目标,在八个化学数据集上开发了QSAR,这些数据集广泛用作评估/验证几种QSAR方法和/或许多不同MDs(主要是3D MDs)的基准。根据原始的训练/测试集划分,为每个化学数据集构建了三到七个变量的QSAR模型。这些模型是通过在MobyDigs软件中使用多元线性回归(MLR)并结合遗传算法作为特征包装选择技术来开发的。尽管有一些例外情况,但每个GDIs家族(用于双链、三链和四链)在所有建模中表现相似。然而,当所有家族结合使用时,所取得的结果在数量上高于其他作者在类似实验中报道的结果。与外部相关系数(q)的比较表明,基于GDIs的模型在所分析的八个数据集中的七个中具有卓越的预测能力,优于基于类似或更复杂技术的方法,并证实了所获得模型的良好预测能力。对于q值,非参数比较显示出与迄今报道的结果有显著差异,这表明基于DIVATI指数的模型呈现出最佳的整体性能,并且比比较中使用的12种0-3D QSAR程序产生了明显更好的预测。因此,GDIs适用于分子的结构编码,并且是构建用于预测物理化学、生物学和环境终点的QSAR的良好替代方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验