Laboratoire d'Infochimie, UMR7177 CNRS, Université de Strasbourg , 4 Rue B. Pascal, Strasbourg Cedex 67008, France.
J Chem Inf Model. 2012 Sep 24;52(9):2325-38. doi: 10.1021/ci300149n. Epub 2012 Sep 4.
This work addresses the problem of similarity search and classification of chemical reactions using Neighborhood Behavior (NB) and Condensed Graphs of Reaction (CGR) approaches. The CGR formalism represents chemical reactions as a classical molecular graph with dynamic bonds, enabling descriptor calculations on this graph. Different types of the ISIDA fragment descriptors generated for CGRs in combination with two metrics--Tanimoto and Euclidean--were considered as chemical spaces, to serve for reaction dissimilarity scoring. The NB method has been used to select an optimal combination of descriptors which distinguish different types of chemical reactions in a database containing 8544 reactions of 9 classes. Relevance of NB analysis has been validated in generic (multiclass) similarity search and in clustering with Self-Organizing Maps (SOM). NB-compliant sets of descriptors were shown to display enhanced mapping propensities, allowing the construction of better Self-Organizing Maps and similarity searches (NB and classical similarity search criteria--AUC ROC--correlate at a level of 0.7). The analysis of the SOM clusters proved chemically meaningful CGR substructures representing specific reaction signatures.
这项工作旨在解决使用邻域行为 (NB) 和反应凝聚图 (CGR) 方法进行化学相似性搜索和分类的问题。CGR 形式将化学反应表示为具有动态键的经典分子图,从而能够在此图上计算描述符。在 CGR 中生成的不同类型的 ISIDA 片段描述符与两种度量(Tanimoto 和欧几里得)相结合,被视为化学空间,用于反应相似度评分。NB 方法已用于选择描述符的最佳组合,这些描述符可以区分数据库中包含 8544 个反应的 9 类中的不同类型的化学反应。NB 分析的相关性已在通用(多类)相似性搜索和自组织映射 (SOM) 聚类中得到验证。结果表明,符合 NB 的描述符集显示出增强的映射倾向,允许构建更好的自组织映射和相似性搜索(NB 和经典相似性搜索标准——AUC ROC——在 0.7 的水平上相关)。SOM 聚类的分析证明了代表特定反应特征的具有化学意义的 CGR 子结构。