一种基于树的化学指纹快速筛选方法。

A tree-based method for the rapid screening of chemical fingerprints.

作者信息

Kristensen Thomas G, Nielsen Jesper, Pedersen Christian N S

机构信息

Bioinformatics Research Center, Aarhus University, CF Møllers Allé 8, DK-8000 Arhus C, Denmark.

出版信息

Algorithms Mol Biol. 2010 Jan 4;5:9. doi: 10.1186/1748-7188-5-9.

DOI:10.1186/1748-7188-5-9

PMID:20047665

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2830925/

Abstract

BACKGROUND

The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint.

RESULTS

In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods.

CONCLUSIONS

Using the novel kD grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to (1) perform more searches than previously possible and (2) to easily search large databases.

摘要

背景

分子指纹是基于其结构的一个位串，构建方式使得结构相似的分子具有相似的指纹。分子指纹可用于药物开发的初始阶段，通过在大型数据库中筛选指纹与查询指纹相似的分子来识别新型候选药物。

结果

在本文中，我们提出了一种方法，该方法能有效地在数据库中找到所有与查询指纹的塔尼莫托系数高于用户定义阈值的指纹。该方法基于两种用于快速筛选大型数据库的新型数据结构：kd网格和多位树。kd网格基于将指纹拆分为k个较短的位串，并利用这些位串来计算完整位串相似性的边界。多位树使用层次聚类和每个聚类内的相似性来计算相似的边界。我们已经实现了我们的方法，并在一个大型真实数据集上进行了测试。我们的实验表明，我们的方法比以前的方法速度提高了约三倍。

结论

使用新型的kd网格和多位树显著减少了搜索指纹数据库所需的时间。这将使研究人员能够（1）比以前进行更多的搜索，以及（2）轻松搜索大型数据库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1298/2830925/6faddd3ec798/1748-7188-5-9-1.jpg

相似文献

A tree-based method for the rapid screening of chemical fingerprints.一种基于树的化学指纹快速筛选方法。

Algorithms Mol Biol. 2010 Jan 4;5:9. doi: 10.1186/1748-7188-5-9.

Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.使用原子对三维指纹对ZINC数据库进行立体选择性虚拟筛选。

J Cheminform. 2015 Feb 10;7:3. doi: 10.1186/s13321-014-0051-5. eCollection 2015.

Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.线性和亚线性时间内化学指纹快速精确搜索的边界与算法

J Chem Inf Model. 2007 Mar-Apr;47(2):302-17. doi: 10.1021/ci600358f. Epub 2007 Feb 28.

Hashing algorithms and data structures for rapid searches of fingerprint vectors.用于快速搜索指纹向量的哈希算法和数据结构。

J Chem Inf Model. 2010 Aug 23;50(8):1358-68. doi: 10.1021/ci100132g.

Database fingerprint (DFP): an approach to represent molecular databases.数据库指纹（DFP）：一种表示分子数据库的方法。

J Cheminform. 2017 Feb 6;9:9. doi: 10.1186/s13321-017-0195-1. eCollection 2017.

A multi-fingerprint browser for the ZINC database.一个用于 ZINC 数据库的多指紋浏览器。

Nucleic Acids Res. 2014 Jul;42(Web Server issue):W234-9. doi: 10.1093/nar/gku379. Epub 2014 Apr 29.

Modeling Tanimoto Similarity Value Distributions and Predicting Search Results.模拟谷本相似度值分布并预测搜索结果。

Mol Inform. 2017 Jul;36(7). doi: 10.1002/minf.201600131. Epub 2016 Dec 29.

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.一种分子指纹统御万物：药物、生物分子与代谢组。

J Cheminform. 2020 Jun 12;12(1):43. doi: 10.1186/s13321-020-00445-4.

Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods.二维指纹分析与比较：使用八种指纹方法深入了解数据库筛选性能。

J Mol Graph Model. 2010 Sep;29(2):157-70. doi: 10.1016/j.jmgm.2010.05.008. Epub 2010 May 25.

Do Molecular Fingerprints Identify Diverse Active Drugs in Large-Scale Virtual Screening? (No).分子指纹图谱能否在大规模虚拟筛选中识别出多种活性药物？（不能）

Pharmaceuticals (Basel). 2024 Jul 26;17(8):992. doi: 10.3390/ph17080992.

引用本文的文献

Electrophysiological and Behavioral Responses of (Coleoptera: Curculionidae) to Plant Volatiles.（鞘翅目：象甲科）对植物挥发物的电生理和行为反应

Plants (Basel). 2024 Dec 26;14(1):42. doi: 10.3390/plants14010042.

Python tools for structural tasks in chemistry.用于化学结构任务的Python工具。

Mol Divers. 2024 May 14. doi: 10.1007/s11030-024-10889-7.

Microsimulation of an educational attainment register to predict future record linkage quality.基于教育程度登记的微观模拟预测未来的记录链接质量。

Int J Popul Data Sci. 2023 Apr 3;8(1):2122. doi: 10.23889/ijpds.v8i1.2122. eCollection 2023.

S2DV: converting SMILES to a drug vector for predicting the activity of anti-HBV small molecules.S2DV：将 SMILES 转换为药物载体，用于预测抗乙肝小分子的活性。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab593.

The chemfp project.化学指纹项目。

J Cheminform. 2019 Dec 5;11(1):76. doi: 10.1186/s13321-019-0398-8.

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.在大型医学数据集上使用加密长期密钥和多位树评估隐私保护记录链接。

BMC Med Inform Decis Mak. 2017 Jun 8;17(1):83. doi: 10.1186/s12911-017-0478-5.

Comput Struct Biotechnol J. 2013 Mar 3;5:e201302009. doi: 10.5936/csbj.201302009. eCollection 2013.

Quantification of hormone sensitive lipase phosphorylation and colocalization with lipid droplets in murine 3T3L1 and human subcutaneous adipocytes via automated digital microscopy and high-content analysis.通过自动数字显微镜和高内涵分析对小鼠3T3L1和人皮下脂肪细胞中激素敏感性脂肪酶磷酸化进行定量分析及其与脂滴的共定位研究。

Assay Drug Dev Technol. 2011 Jun;9(3):262-80. doi: 10.1089/adt.2010.0302. Epub 2010 Dec 27.

本文引用的文献

Compressed binary bit trees: a new data structure for accelerating database searching.压缩二进制位树：一种加速数据库搜索的新数据结构。

J Chem Inf Model. 2009 Feb;49(2):257-62. doi: 10.1021/ci800325v.

Speeding up chemical database searches using a proximity filter based on the logical exclusive or.使用基于逻辑异或的邻近过滤器加速化学数据库搜索。

J Chem Inf Model. 2008 Jul;48(7):1367-78. doi: 10.1021/ci800076s. Epub 2008 Jul 2.

Bounds and algorithms for fast exact searches of chemical fingerprints in linear and sublinear time.线性和亚线性时间内化学指纹快速精确搜索的边界与算法

J Chem Inf Model. 2007 Mar-Apr;47(2):302-17. doi: 10.1021/ci600358f. Epub 2007 Feb 28.

ZINC--a free database of commercially available compounds for virtual screening.锌数据库——一个可用于虚拟筛选的商业可用化合物免费数据库。

J Chem Inf Model. 2005 Jan-Feb;45(1):177-82. doi: 10.1021/ci049714+.

Biochem Soc Trans. 2003 Jun;31(Pt 3):603-6. doi: 10.1042/bst0310603.

The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.化学开发工具包（CDK）：一个用于化学信息学和生物信息学的开源Java库。

J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):493-500. doi: 10.1021/ci025584y.

J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):338-45. doi: 10.1021/ci025592e.

The neighbor-joining method: a new method for reconstructing phylogenetic trees.邻接法：一种重建系统发育树的新方法。

Mol Biol Evol. 1987 Jul;4(4):406-25. doi: 10.1093/oxfordjournals.molbev.a040454.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于树的化学指纹快速筛选方法。

A tree-based method for the rapid screening of chemical fingerprints.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献