Suppr超能文献

一种用于分析超大型化合物库中高度相似化合物的快速聚类算法。

A fast clustering algorithm for analyzing highly similar compounds of very large libraries.

作者信息

Li Weizhong

机构信息

Burnham Institute for Medical Research, 10901 N. Torrey Pines Rd., La Jolla, California 92037, USA.

出版信息

J Chem Inf Model. 2006 Sep-Oct;46(5):1919-23. doi: 10.1021/ci0600859.

Abstract

As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace.org.

摘要

由于药物发现中高通量筛选的最新进展,可用筛选化合物的数量一直在迅速增长。化学供应商提供数百万种化合物;然而,这些化合物高度冗余。聚类分析是一种将相似化合物分组为族的技术,可用于分析这种冗余性。许多现有的聚类方法专注于化合物的准确分类;它们速度慢,不适用于非常大的化合物库。本文描述了一种基于增量聚类算法和化合物二维指纹的快速聚类方法。该方法可以在一台计算机上数小时内对包含数百万种化合物的非常大的数据集进行聚类。使用此方法实现的程序cd-hit-fp可从http://chemspace.org获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验