Suppr超能文献

一种用于大型化合物库的层次聚类方法。

A hierarchical clustering approach for large compound libraries.

作者信息

Böcker Alexander, Derksen Swetlana, Schmidt Elena, Teckentrup Andreas, Schneider Gisbert

机构信息

Johann Wolfgang Goethe-Universität, Institut für Organische Chemie und Chemische Biologie, Marie-Curie-Str. 11, D-60439 Frankfurt, Germany.

出版信息

J Chem Inf Model. 2005 Jul-Aug;45(4):807-15. doi: 10.1021/ci0500029.

Abstract

A modified version of the k-means clustering algorithm was developed that is able to analyze large compound libraries. A distance threshold determined by plotting the sum of radii of leaf clusters was used as a termination criterion for the clustering process. Hierarchical trees were constructed that can be used to obtain an overview of the data distribution and inherent cluster structure. The approach is also applicable to ligand-based virtual screening with the aim to generate preferred screening collections or focused compound libraries. Retrospective analysis of two activity classes was performed: inhibitors of caspase 1 [interleukin 1 (IL1) cleaving enzyme, ICE] and glucocorticoid receptor ligands. The MDL Drug Data Report (MDDR) and Collection of Bioactive Reference Analogues (COBRA) databases served as the compound pool, for which binary trees were produced. Molecules were encoded by all Molecular Operating Environment 2D descriptors and topological pharmacophore atom types. Individual clusters were assessed for their purity and enrichment of actives belonging to the two ligand classes. Significant enrichment was observed in individual branches of the cluster tree. After clustering a combined database of MDDR, COBRA, and the SPECS catalog, it was possible to retrieve MDDR ICE inhibitors with new scaffolds using COBRA ICE inhibitors as seeds. A Java implementation of the clustering method is available via the Internet (http://www.modlab.de).

摘要

开发了一种改进版的k均值聚类算法,该算法能够分析大型化合物库。通过绘制叶簇半径之和确定的距离阈值被用作聚类过程的终止标准。构建了层次树,可用于获取数据分布和固有簇结构的概述。该方法也适用于基于配体的虚拟筛选,目的是生成优选的筛选集或聚焦化合物库。对两个活性类别进行了回顾性分析:半胱天冬酶1(白细胞介素1(IL1)裂解酶,ICE)抑制剂和糖皮质激素受体配体。MDL药物数据报告(MDDR)和生物活性参考类似物集合(COBRA)数据库用作化合物库,为其生成了二叉树。分子由所有分子操作环境2D描述符和拓扑药效团原子类型编码。评估了各个簇的纯度以及属于两种配体类别的活性物质的富集情况。在簇树的各个分支中观察到了显著的富集。对MDDR、COBRA和SPECS目录的组合数据库进行聚类后,有可能以COBRA ICE抑制剂为种子检索具有新支架的MDDR ICE抑制剂。可通过互联网(http://www.modlab.de)获得该聚类方法的Java实现。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验