Suppr超能文献

随机设计还是合理设计?对化学结构数据库中不同化合物子集的评估。

Random or rational design? Evaluation of diverse compound subsets from chemical structure databases.

作者信息

Pötter T, Matter H

机构信息

BAYER AG, Landwirtschaftszentrum, Monheim, Germany.

出版信息

J Med Chem. 1998 Feb 12;41(4):478-88. doi: 10.1021/jm9700878.

Abstract

The performance of rational design to maximize the structural diversity of databases for lead finding and lead refinement was investigated. Rational methods such as maximum dissimilarity methods or hierarchical cluster analysis for designing compound subsets were compared to a random approach to study their efficiency for an enhancement of the diversity of three different databases. All investigations were done based on 2D fingerprints as a validated molecular descriptor. To compare the performance of the rational selection methods to a random approach, we additionally used probability calculations. When using maximum dissimilarity-based selections, a single compound can be a member of different neighborhoods as defined by the similarity threshold value, while in hierarchical clustering each compound is assigned to only a single cluster. Therefore the relationship between the similarity threshold of the maximum diversity selection method and a 2D similarity search threshold was studied. In contrast to hierarchical clustering analysis, maximum dissimilarity selections allow to use a similarity threshold for adding a new compound to an already selected compound list. Reasonable values for this similarity threshold are presented here. More diverse subsets were designed using maximum dissimilarity selections, which cover more biological classes than using random selections. An optimally diverse subset without redundant structures containing only 38% of one original dataset was generated, where no structure is more similar than 0.85 to its nearest neighbor, but all biological classes were represented. When it is acceptable to cover only 90% of all biological targets, 3.5-3.7 times more compounds need to be selected using a random approach than in a rational design approach. Such coverage rate shows the highest efficiency of design techniques compared to a random approach. In those subsets no compound is closer than 0.70 to its nearest neighbor. Furthermore a comparative molecular field analysis (CoMFA) is used to evaluate designed and randomly chosen subsets for a database consisting of inhibitors of the angiotensin-converting enzyme. It was shown that designed subsets using maximum dissimilarity methods lead to more stable quantitative structure-activity relationship (QSAR) models with higher predictive power compared to randomly chosen compounds. This predictive power is especially high when there is no compound in the test dataset with a similarity coefficient less than 0.7 to its nearest neighbor in the training set.

摘要

研究了合理设计在最大化用于先导化合物发现和优化的数据库结构多样性方面的表现。将用于设计化合物子集的合理方法(如最大差异方法或层次聚类分析)与随机方法进行比较,以研究它们增强三个不同数据库多样性的效率。所有研究均基于二维指纹作为经过验证的分子描述符进行。为了将合理选择方法的性能与随机方法进行比较,我们还使用了概率计算。当使用基于最大差异的选择时,单个化合物可以是由相似性阈值定义的不同邻域的成员,而在层次聚类中,每个化合物仅被分配到一个簇。因此,研究了最大多样性选择方法的相似性阈值与二维相似性搜索阈值之间的关系。与层次聚类分析不同,最大差异选择允许使用相似性阈值将新化合物添加到已选化合物列表中。这里给出了该相似性阈值的合理值。使用最大差异选择设计出了更多样化的子集,与随机选择相比,这些子集涵盖了更多的生物类别。生成了一个最优的多样化子集,其中不包含冗余结构,仅占一个原始数据集的38%,其中没有结构与其最近邻的相似度超过0.85,但所有生物类别均有代表。当只覆盖所有生物靶点的90%可接受时,与合理设计方法相比,使用随机方法需要选择的化合物数量多3.5 - 3.7倍。与随机方法相比,这种覆盖率显示了设计技术的最高效率。在这些子集中,没有化合物与其最近邻的距离小于0.70。此外,使用比较分子场分析(CoMFA)来评估由血管紧张素转换酶抑制剂组成的数据库的设计子集和随机选择的子集。结果表明,与随机选择的化合物相比,使用最大差异方法设计的子集能产生更稳定的定量构效关系(QSAR)模型,且具有更高的预测能力。当测试数据集中没有化合物与其训练集中最近邻的相似系数小于0.7时,这种预测能力尤其高。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验