Suppr超能文献

一种分子指纹统御万物:药物、生物分子与代谢组。

One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.

作者信息

Capecchi Alice, Probst Daniel, Reymond Jean-Louis

机构信息

Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.

出版信息

J Cheminform. 2020 Jun 12;12(1):43. doi: 10.1186/s13321-020-00445-4.

Abstract

BACKGROUND

Molecular fingerprints are essential cheminformatics tools for virtual screening and mapping chemical space. Among the different types of fingerprints, substructure fingerprints perform best for small molecules such as drugs, while atom-pair fingerprints are preferable for large molecules such as peptides. However, no available fingerprint achieves good performance on both classes of molecules.

RESULTS

Here we set out to design a new fingerprint suitable for both small and large molecules by combining substructure and atom-pair concepts. Our quest resulted in a new fingerprint called MinHashed atom-pair fingerprint up to a diameter of four bonds (MAP4). In this fingerprint the circular substructures with radii of r = 1 and r = 2 bonds around each atom in an atom-pair are written as two pairs of SMILES, each pair being combined with the topological distance separating the two central atoms. These so-called atom-pair molecular shingles are hashed, and the resulting set of hashes is MinHashed to form the MAP4 fingerprint. MAP4 significantly outperforms all other fingerprints on an extended benchmark that combines the Riniker and Landrum small molecule benchmark with a peptide benchmark recovering BLAST analogs from either scrambled or point mutation analogs. MAP4 furthermore produces well-organized chemical space tree-maps (TMAPs) for databases as diverse as DrugBank, ChEMBL, SwissProt and the Human Metabolome Database (HMBD), and differentiates between all metabolites in HMBD, over 70% of which are indistinguishable from their nearest neighbor using substructure fingerprints.

CONCLUSION

MAP4 is a new molecular fingerprint suitable for drugs, biomolecules, and the metabolome and can be adopted as a universal fingerprint to describe and search chemical space. The source code is available at https://github.com/reymond-group/map4 and interactive MAP4 similarity search tools and TMAPs for various databases are accessible at http://map-search.gdb.tools/ and http://tm.gdb.tools/map4/.

摘要

背景

分子指纹是虚拟筛选和化学空间映射的重要化学信息学工具。在不同类型的指纹中,子结构指纹对药物等小分子表现最佳,而原子对指纹则更适用于肽等大分子。然而,目前没有一种指纹能在这两类分子上都表现良好。

结果

在此,我们着手通过结合子结构和原子对概念来设计一种适用于小分子和大分子的新指纹。我们的探索产生了一种名为直径达四个键的最小哈希原子对指纹(MAP4)。在这种指纹中,原子对中每个原子周围半径为r = 1和r = 2键的圆形子结构被写成两对SMILES,每对与分隔两个中心原子的拓扑距离相结合。这些所谓的原子对分子片段被哈希处理,所得的哈希集经过最小哈希处理以形成MAP4指纹。在将里尼克和兰德鲁姆小分子基准与肽基准相结合的扩展基准测试中,MAP4显著优于所有其他指纹,该肽基准可从乱序或点突变类似物中恢复BLAST类似物。此外,MAP4为药物银行、ChEMBL、瑞士蛋白质数据库和人类代谢组数据库(HMBD)等各种数据库生成了组织良好化学空间树形图(TMAP),并区分了HMBD中的所有代谢物,其中超过70%使用子结构指纹时与最近邻无法区分。

结论

MAP4是一种适用于药物、生物分子和代谢组的新分子指纹,可作为描述和搜索化学空间的通用指纹。源代码可在https://github.com/reymond-group/map4获取,各种数据库的交互式MAP4相似性搜索工具和TMAP可在http://map-search.gdb.tools/和http://tm.gdb.tools/map4/访问。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验