• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于大型化学空间精确聚类的阻塞倒排索引。

Blocked inverted indices for exact clustering of large chemical spaces.

作者信息

Thiel Philipp, Sach-Peltason Lisa, Ottmann Christian, Kohlbacher Oliver

机构信息

Applied Bioinformatics, Center for Bioinformatics, Quantitative Biology Center and Dept. of Computer Science, University of Tübingen , Sand 14, 72076 Tübingen, Germany.

出版信息

J Chem Inf Model. 2014 Sep 22;54(9):2395-401. doi: 10.1021/ci500150t. Epub 2014 Sep 2.

DOI:10.1021/ci500150t
PMID:25136755
Abstract

The calculation of pairwise compound similarities based on fingerprints is one of the fundamental tasks in chemoinformatics. Methods for efficient calculation of compound similarities are of the utmost importance for various applications like similarity searching or library clustering. With the increasing size of public compound databases, exact clustering of these databases is desirable, but often computationally prohibitively expensive. We present an optimized inverted index algorithm for the calculation of all pairwise similarities on 2D fingerprints of a given data set. In contrast to other algorithms, it neither requires GPU computing nor yields a stochastic approximation of the clustering. The algorithm has been designed to work well with multicore architectures and shows excellent parallel speedup. As an application example of this algorithm, we implemented a deterministic clustering application, which has been designed to decompose virtual libraries comprising tens of millions of compounds in a short time on current hardware. Our results show that our implementation achieves more than 400 million Tanimoto similarity calculations per second on a common desktop CPU. Deterministic clustering of the available chemical space thus can be done on modern multicore machines within a few days.

摘要

基于指纹计算成对化合物相似度是化学信息学中的基本任务之一。高效计算化合物相似度的方法对于诸如相似度搜索或库聚类等各种应用至关重要。随着公共化合物数据库规模的不断增大,对这些数据库进行精确聚类是很有必要的,但通常计算成本过高。我们提出了一种优化的倒排索引算法,用于计算给定数据集二维指纹的所有成对相似度。与其他算法不同,它既不需要GPU计算,也不会产生聚类的随机近似值。该算法设计为能很好地适用于多核架构,并具有出色的并行加速比。作为此算法的一个应用示例,我们实现了一个确定性聚类应用程序,其设计目的是在当前硬件上短时间内分解包含数千万种化合物的虚拟库。我们的结果表明,我们的实现方案在普通桌面CPU上每秒可完成超过4亿次Tanimoto相似度计算。因此,在现代多核机器上,可用化学空间的确定性聚类可以在几天内完成。

相似文献

1
Blocked inverted indices for exact clustering of large chemical spaces.用于大型化学空间精确聚类的阻塞倒排索引。
J Chem Inf Model. 2014 Sep 22;54(9):2395-401. doi: 10.1021/ci500150t. Epub 2014 Sep 2.
2
Accelerating two algorithms for large-scale compound selection on GPUs.加速 GPU 上大规模化合物筛选的两种算法。
J Chem Inf Model. 2011 May 23;51(5):1017-24. doi: 10.1021/ci200061p. Epub 2011 Apr 28.
3
A fast clustering algorithm for analyzing highly similar compounds of very large libraries.一种用于分析超大型化合物库中高度相似化合物的快速聚类算法。
J Chem Inf Model. 2006 Sep-Oct;46(5):1919-23. doi: 10.1021/ci0600859.
4
GPU-accelerated Chemical Similarity Assessment for Large Scale Databases.用于大规模数据库的GPU加速化学相似性评估
Procedia Comput Sci. 2011;4:2007-2016. doi: 10.1016/j.procs.2011.04.219. Epub 2011 May 14.
5
GPU accelerated chemical similarity calculation for compound library comparison.GPU 加速的化合物库比较中的化学相似性计算。
J Chem Inf Model. 2011 Jul 25;51(7):1521-7. doi: 10.1021/ci1004948. Epub 2011 Jul 1.
6
Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing.通过几何嵌入和局部敏感哈希加速大型化合物集的相似性搜索和聚类。
Bioinformatics. 2010 Apr 1;26(7):953-9. doi: 10.1093/bioinformatics/btq067. Epub 2010 Feb 23.
7
Multi-view spectral clustering and its chemical application.多视图光谱聚类及其化学应用。
Int J Comput Biol Drug Des. 2013;6(1-2):32-49. doi: 10.1504/IJCBDD.2013.052200. Epub 2013 Feb 21.
8
SymDex: increasing the efficiency of chemical fingerprint similarity searches for comparing large chemical libraries by using query set indexing.SymDex:通过查询集索引提高化学指纹相似性搜索比较大型化学库的效率。
J Chem Inf Model. 2012 Aug 27;52(8):1926-35. doi: 10.1021/ci200606t. Epub 2012 Aug 7.
9
Parallel hash-based EST clustering algorithm for gene sequencing.用于基因测序的基于哈希的并行EST聚类算法
DNA Cell Biol. 2004 Oct;23(10):615-23. doi: 10.1089/dna.2004.23.615.
10
QuBiLS-MIDAS: a parallel free-software for molecular descriptors computation based on multilinear algebraic maps.QuBiLS-MIDAS:一种基于多元线性代数映射的分子描述符计算并行免费软件。
J Comput Chem. 2014 Jul 5;35(18):1395-409. doi: 10.1002/jcc.23640. Epub 2014 Jun 2.

引用本文的文献

1
BitBIRCH: efficient clustering of large molecular libraries.BitBIRCH:大型分子文库的高效聚类
Digit Discov. 2025 Mar 13;4(4):1042-1051. doi: 10.1039/d5dd00030k. eCollection 2025 Apr 9.
2
Efficient clustering of large molecular libraries.大型分子文库的高效聚类
bioRxiv. 2024 Aug 10:2024.08.10.607459. doi: 10.1101/2024.08.10.607459.
3
The chemfp project.化学指纹项目。
J Cheminform. 2019 Dec 5;11(1):76. doi: 10.1186/s13321-019-0398-8.