Martínez-Santiago Oscar, Millán-Cabrera Reisel, Marrero-Ponce Yovani, Barigye Stephen J, Martínez-López Yoan, Torrens Francisco, Pérez-Giménez Facundo
Unit of Computer-Aided Molecular "Biosilico" Discovery and Bioinformatic Research (CAMD-BIR Unit), Facultad de Química y Farmacia,. Universidad Central "Marta Abreu" de Las Villas, Carretera a Camajuani Km 5 1/2, Santa Clara, 54830, Villa Clara, Cuba. fax: 963543156; phone: 963543156.
Doctorado en Toxicología Ambiental, Facultad de Química Farmacéutica, Universidad de Cartagena, Cartagena de Indias, Bolívar, Colombia.
Mol Inform. 2014 May;33(5):343-68. doi: 10.1002/minf.201300173. Epub 2014 May 12.
This report presents a new mathematical method based on the concept of the derivative of a molecular graph (G) with respect to a given event (S) to codify chemical structure information. The derivate over each pair of atoms in the molecule is defined as ∂G/∂S(vi , vj )=(fi -2fij +fj )/fij , where fi (or fj ) and fij are the individual frequency of atom i (or j) and the reciprocal frequency of the atoms i and j, respectively. These frequencies characterize the participation intensity of atom pairs in S. Here, the event space is composed of molecular sub-graphs which participate in the formation of the G skeleton that could be complete (representing all possible connected sub-graphs) or comprised of sub-graphs of certain orders or types or combinations of these. The atom level graph derivative index, Δi , is expressed as a linear combination of all atom pair derivatives that include the atomic nuclei i. Global [total or local (group or atom-type)] indices are obtained by applying the so called invariants over a vector of Δi values. The novel MDs are validated using a data set of 28 alkyl-alcohols and other benchmark data sets proposed by the International Academy of Mathematical Chemistry. Also, the boiling point for the alcohols, the adrenergic blocking activity of N,N-dimethyl-2-halo-phenethylamines and physicochemical properties of polychlorinated biphenyls and octanes are modeled. These models exhibit satisfactory predictive power compared with other 0-3D indices implemented successfully by other researchers. In addition, tendencies of the proposed indices are investigated using examples of various types of molecular structures, including chain-lengthening, branching, heteroatoms-content, and multiple bonds. On the other hand, the relation of atom-based derivative indices with (17) O NMR of a series of ethers and carbonyls reflects that the new MDs encode electronic, topological and steric information. Linear independence between the graph derivative indices and other 0-3D MDs is demonstrated by using principal component analysis on a dataset of 41 heterogeneous molecules. It is concluded that the graph derivative indices are independent indices containing important structural information to be used in QSPR/QSAR and drug design studies, and permit obtaining easier, more interpretable and robust mathematical models than the majority of those reported in the literature.
本报告提出了一种基于分子图(G)相对于给定事件(S)的导数概念的新数学方法,用于编码化学结构信息。分子中每对原子的导数定义为∂G/∂S(vi , vj )=(fi -2fij +fj )/fij ,其中fi(或fj)和fij分别是原子i(或j)的个体频率以及原子i和j的倒数频率。这些频率表征了原子对在S中的参与强度。这里,事件空间由参与G骨架形成的分子子图组成,这些子图可以是完整的(代表所有可能的连通子图),或者由特定阶数、类型的子图或它们的组合组成。原子水平的图导数指数Δi表示为所有包含原子核i的原子对导数的线性组合。通过对Δi值向量应用所谓的不变量来获得全局[总或局部(基团或原子类型)]指数。使用28种烷基醇的数据集和国际数学化学学会提出的其他基准数据集对新型分子描述符进行了验证。此外,还对醇的沸点、N,N-二甲基-2-卤代苯乙胺的肾上腺素能阻断活性以及多氯联苯和辛烷的物理化学性质进行了建模。与其他研究人员成功实施的其他0-3D指数相比,这些模型具有令人满意的预测能力。此外,使用各种类型分子结构的示例,包括链延长、支化、杂原子含量和多重键,研究了所提出指数的趋势。另一方面,基于原子的导数指数与一系列醚和羰基的(17)O NMR之间的关系反映出,新的分子描述符编码了电子、拓扑和立体信息。通过对41个异构分子的数据集进行主成分分析,证明了图导数指数与其他0-3D分子描述符之间的线性独立性。得出的结论是,图导数指数是独立的指数,包含用于定量构效关系/定量结构活性关系和药物设计研究的重要结构信息,并且比文献中报道的大多数模型更容易获得、更具可解释性且更稳健。