NIPALSTREE：一种用于大型化合物库的新型层次聚类方法及其在虚拟筛选中的应用。

NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.

作者信息

Böcker Alexander, Schneider Gisbert, Teckentrup Andreas

机构信息

Institut für Organische Chemie und Chemische Biologie, Johann Wolfgang Goethe-Universität, Marie-Curie-Strasse 11, D-60439 Frankfurt, Germany.

出版信息

J Chem Inf Model. 2006 Nov-Dec;46(6):2220-9. doi: 10.1021/ci050541d.

DOI:10.1021/ci050541d

PMID:17125166

Abstract

A hierarchical clustering algorithm--NIPALSTREE--was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.

摘要

开发了一种层次聚类算法——NIPALSTREE，它能够分析高维空间中的大型数据集。结果可以显示为树形图。在每个树层级，该算法通过主成分分析将数据集投影到一个维度上。数据集根据这一维度进行排序，并在中位数位置进行分割。为避免在中位数位置出现聚类失真，该算法会在中位数的左侧或右侧识别一个可能更合适的分割点。该过程在所得子集中递归应用，直到聚类成员之间的最大距离超过用户定义的阈值。该方法在一项针对血管紧张素转换酶（ACE）抑制剂的回顾性筛选研究中得到验证。对所得聚类进行纯度评估，并评估属于该配体类别的活性物质的富集情况。在树形图的各个分支中观察到了富集现象。在进一步使用MDL药物数据报告（MDDR）、COBRA和SPECS目录的回顾性虚拟筛选研究中，将NIPALSTREE与层次k均值聚类方法进行了比较。结果表明，这两种算法都可用于虚拟筛选。将两种算法获得的结果列表相交，在仅损失少量化学型的情况下提高了富集因子。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

NIPALSTREE：一种用于大型化合物库的新型层次聚类方法及其在虚拟筛选中的应用。

NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

NIPALSTREE：一种用于大型化合物库的新型层次聚类方法及其在虚拟筛选中的应用。

NIPALSTREE: a new hierarchical clustering approach for large compound libraries and its application to virtual screening.

作者信息

机构信息

出版信息

相似文献

引用本文的文献