García-Jacas César R, Cabrera-Leyva Lisset, Marrero-Ponce Yovani, Suárez-Lezcano José, Cortés-Guzmán Fernando, Pupo-Meriño Mario, Vivas-Reyes Ricardo
Instituto de Química, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México.
Grupo de Investigación de Inteligencia Artificial (AIRES), Facultad de Informática, Universidad de Camagüey, Camagüey, Cuba.
J Cheminform. 2018 Oct 25;10(1):51. doi: 10.1186/s13321-018-0306-7.
Several topological (2D) and geometric (3D) molecular descriptors (MDs) are calculated from local vertex/edge invariants (LOVIs/LOEIs) by performing an aggregation process. To this end, norm-, mean- and statistic-based (non-fuzzy) operators are used, under the assumption that LOVIs/LOEIs are independent (orthogonal) values of one another. These operators are based on additive and/or linear measures and, consequently, they cannot be used to encode information from interrelated criteria. Thus, as LOVIs/LOEIs are not orthogonal values, then non-additive (fuzzy) measures can be used to encode the interrelation among them.
General approaches to compute fuzzy 2D/3D-MDs from the contribution of each atom (LOVIs) or covalent bond (LOEIs) within a molecule are proposed, by using the Choquet integral as fuzzy aggregation operator. The Choquet integral-based operator is rather different from the other operators often used for the 2D/3D-MDs calculation. It performs a reordering step to fuse the LOVIs/LOEIs according to their magnitudes and, in addition, it considers the interrelation among them through a fuzzy measure. With this operator, fuzzy definitions can be derived from traditional or recent MDs; for instance, fuzzy Randic-like connectivity indices, fuzzy Balaban-like indices, fuzzy Kier-Hall connectivity indices, among others. To demonstrate the feasibility of using this operator, the QuBiLS-MIDAS 3D-MDs were used as study case and, as a result, a module was built into the corresponding software to compute them ( http://tomocomd.com/qubils-midas ). Thus, it is the only software reported in the literature that can be employed to determine Choquet integral-based fuzzy MDs. Moreover, regression models were created on eight chemical datasets. In this way, a comparison between the results achieved by the models based on the non-fuzzy QuBiLS-MIDAS 3D-MDs with regard to the ones achieved by the models based on the fuzzy QuBiLS-MIDAS 3D-MDs was made. As a result, the models built with the fuzzy QuBiLS-MIDAS 3D-MDs achieved the best performance, which was statistically corroborated through the Wilcoxon signed-rank test.
All in all, it can be concluded that the Choquet integral constitutes a prominent alternative to compute fuzzy 2D/3D-MDs from LOVIs/LOEIs. In this way, better characterizations of the compounds can be obtained, which will be ultimately useful in enhancing the modelling ability of existing traditional 2D/3D-MDs.
通过执行聚合过程,从局部顶点/边不变量(局部顶点不变量/局部边不变量)计算出几种拓扑(二维)和几何(三维)分子描述符(MDs)。为此,在假设局部顶点不变量/局部边不变量彼此独立(正交)的情况下,使用基于范数、均值和统计的(非模糊)算子。这些算子基于加法和/或线性度量,因此,它们不能用于编码来自相关标准的信息。因此,由于局部顶点不变量/局部边不变量不是正交值,那么可以使用非加法(模糊)度量来编码它们之间的相互关系。
提出了通过使用Choquet积分作为模糊聚合算子,从分子内每个原子(局部顶点不变量)或共价键(局部边不变量)的贡献计算模糊二维/三维分子描述符的一般方法。基于Choquet积分的算子与常用于二维/三维分子描述符计算的其他算子有很大不同。它执行一个重新排序步骤,根据局部顶点不变量/局部边不变量的大小融合它们,此外,它通过模糊度量考虑它们之间的相互关系。使用这个算子,可以从传统或最近的分子描述符中导出模糊定义;例如,模糊类Randic连接性指数、模糊类Balaban指数、模糊Kier-Hall连接性指数等。为了证明使用这个算子的可行性,将QuBiLS-MIDAS三维分子描述符用作研究案例,结果在相应软件中构建了一个模块来计算它们(http://tomocomd.com/qubils-midas)。因此,它是文献中报道的唯一可用于确定基于Choquet积分的模糊分子描述符的软件。此外,在八个化学数据集上创建了回归模型。通过这种方式,对基于非模糊QuBiLS-MIDAS三维分子描述符的模型与基于模糊QuBiLS-MIDAS三维分子描述符的模型所取得的结果进行了比较。结果表明,基于模糊QuBiLS-MIDAS三维分子描述符构建的模型具有最佳性能,通过Wilcoxon符号秩检验在统计上得到了证实。
总而言之,可以得出结论,Choquet积分是从局部顶点不变量/局部边不变量计算模糊二维/三维分子描述符的一个突出选择。通过这种方式,可以获得对化合物更好的表征,这最终将有助于提高现有传统二维/三维分子描述符的建模能力。