Suppr超能文献

针对特定活动的训练相似度度量:在简化图中的应用。

Training similarity measures for specific activities: application to reduced graphs.

作者信息

Birchall Kristian, Gillet Valerie J, Harper Gavin, Pickett Stephen D

机构信息

Department of Information Studies, University of Sheffield, Western Bank, Sheffield S10 2TN, United Kingdom.

出版信息

J Chem Inf Model. 2006 Mar-Apr;46(2):577-86. doi: 10.1021/ci050465e.

Abstract

Reduced graph representations of chemical structures have been shown to be effective in similarity searching applications where they offer comparable performance to other 2D descriptors in terms of recall experiments. They have also been shown to complement existing descriptors and to offer potential to scaffold hop from one chemical series to another. Various methods have been developed for quantifying the similarity between reduced graphs including fingerprint approaches, graph matching, and an edit distance method. The edit distance approach quantifies the degree of similarity of two reduced graphs based on the number and type of operations required to convert one graph to the other. An attractive feature of the edit distance method is the ability to assign different weights to different operations. For example, the mutation of an aromatic ring node to an acyclic node may be assigned a higher weight than the mutation of an aromatic ring to an aliphatic ring node. In this paper, we describe a genetic algorithm (GA) for training the weights of the different edit distance operations. The method is applied to specific activity classes extracted from the MDDR database to derive activity-class specific weights. The GA-derived weights give substantially improved results in recall experiments as compared to using weights assigned on intuition. Furthermore, such activity specific weights may provide useful structure--activity information for subsequent design efforts. In a virtual screening setting when few active compounds are known, it may be more useful to have weights that perform well across a variety of different activity classes. Thus, the GA is also trained on multiple activity classes simultaneously to derive a generalized set of weights. These more generally applicable weights also represent a substantial improvement on previous work.

摘要

化学结构的简化图表示已被证明在相似性搜索应用中是有效的,在召回实验方面,它们与其他二维描述符具有可比的性能。它们还被证明可以补充现有描述符,并为从一个化学系列跳跃到另一个化学系列提供潜力。已经开发了各种方法来量化简化图之间的相似性,包括指纹方法、图匹配和编辑距离方法。编辑距离方法基于将一个图转换为另一个图所需的操作数量和类型来量化两个简化图的相似程度。编辑距离方法的一个吸引人的特点是能够为不同的操作分配不同的权重。例如,将芳香环节点突变为无环节点可能比将芳香环突变为脂肪环节点分配更高的权重。在本文中,我们描述了一种遗传算法(GA),用于训练不同编辑距离操作的权重。该方法应用于从MDDR数据库中提取的特定活性类别,以得出特定于活性类别的权重。与使用凭直觉分配的权重相比,GA得出的权重在召回实验中给出了显著改进的结果。此外,这种特定于活性的权重可能为后续的设计工作提供有用的结构 - 活性信息。在虚拟筛选环境中,当已知的活性化合物很少时,拥有在各种不同活性类别中都表现良好的权重可能更有用。因此,GA也同时在多个活性类别上进行训练,以得出一组通用的权重。这些更普遍适用的权重也代表了对先前工作的实质性改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验