自动识别大型化合物数据集的类似物系列：方法与应用。

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications.

机构信息

Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany.

出版信息

Molecules. 2021 Aug 31;26(17):5291. doi: 10.3390/molecules26175291.

DOI:10.3390/molecules26175291

PMID:34500724

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8433811/

Abstract

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.

摘要

类似物系列在药物发现中起着关键作用。它们自然出现在先导化合物优化工作中，其中根据一个或几个核心结构来探索类似物。然而，在没有预定义核心结构的大型化合物数据库中，更难准确识别和提取成对或系列的类似分子。本方法学综述概述了最常见和最新的方法学进展，用于自动识别大型库中的类似物系列。最初的方法侧重于使用预定义规则来提取支架结构，例如流行的 Bemis-Murcko 支架。后来，匹配分子对的概念导致了有效的算法，通过探索每个化合物的许多可能的支架来识别具有共同核心结构的相似化合物。这些想法的进一步发展一方面产生了分层支架分解的方法，另一方面产生了基于单一位点修饰（所谓的匹配分子系列）提取类似物系列的算法，通过基于系统分子断裂来探索潜在的支架结构。最终，这些方法的进一步发展导致了定义为具有几个取代位点的单个核心结构的类似物系列的提取方法，这些方法允许方便的表示，例如 R 基团表。这些方法能够高效地分析包含数十万甚至数百万种化合物的大型数据集，并催生了许多相关的方法学发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a222/8433811/bc173f928606/molecules-26-05291-g001.jpg

相似文献

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications.自动识别大型化合物数据集的类似物系列：方法与应用。

Molecules. 2021 Aug 31;26(17):5291. doi: 10.3390/molecules26175291.

Systematic Extraction of Analogue Series from Large Compound Collections Using a New Computational Compound-Core Relationship Method.使用一种新的计算化合物-核心关系方法从大型化合物库中系统提取类似物系列

ACS Omega. 2019 Jan 14;4(1):1027-1032. doi: 10.1021/acsomega.8b03390. eCollection 2019 Jan 31.

Toward an efficient approach to identify molecular scaffolds possessing selective or promiscuous compounds.针对一种有效的方法，用于鉴定具有选择性或混杂化合物的分子支架。

Chem Biol Drug Des. 2013 Oct;82(4):367-75. doi: 10.1111/cbdd.12162. Epub 2013 Sep 10.

Method for Systematic Analogue Search Using the Mega SAR Matrix Database.利用 Mega SAR 矩阵数据库进行系统模拟搜索的方法。

J Chem Inf Model. 2019 Sep 23;59(9):3727-3734. doi: 10.1021/acs.jcim.9b00557. Epub 2019 Aug 30.

Computational Exploration of Molecular Scaffolds in Medicinal Chemistry.药物化学中分子骨架的计算探索

J Med Chem. 2016 May 12;59(9):4062-76. doi: 10.1021/acs.jmedchem.5b01746. Epub 2016 Feb 3.

Engineering Aspects of Olfaction嗅觉的工程学方面

A general approach for retrosynthetic molecular core analysis.一种用于逆合成分子核心分析的通用方法。

J Cheminform. 2019 Sep 24;11(1):61. doi: 10.1186/s13321-019-0380-5.

Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles.用于系统鉴定模拟系列以及代表系列的关键化合物及其生物活性谱的计算方法。

J Med Chem. 2016 Aug 25;59(16):7667-76. doi: 10.1021/acs.jmedchem.6b00906. Epub 2016 Aug 8.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Computational design of new molecular scaffolds for medicinal chemistry, part II: generalization of analog series-based scaffolds.药物化学新分子骨架的计算设计，第二部分：基于类似物系列的骨架的推广

Future Sci OA. 2017 Nov 30;4(2):FSO267. doi: 10.4155/fsoa-2017-0102. eCollection 2018 Feb.

引用本文的文献

SPLIF-Enhanced Attention-Driven 3D CNNs for Precise and Reliable Protein-Ligand Interaction Modeling for METTL3.用于METTL3精确可靠蛋白质-配体相互作用建模的基于SPLIF增强注意力驱动的3D卷积神经网络

ACS Omega. 2025 Apr 16;10(16):16748-16761. doi: 10.1021/acsomega.5c00538. eCollection 2025 Apr 29.

Cheminformatics and artificial intelligence for accelerating agrochemical discovery.用于加速农用化学品发现的化学信息学与人工智能

Front Chem. 2023 Nov 29;11:1292027. doi: 10.3389/fchem.2023.1292027. eCollection 2023.

ZINC-22─A Free Multi-Billion-Scale Database of Tangible Compounds for Ligand Discovery.ZINC-22─一个免费的、数十亿规模的有形化合物数据库，用于配体发现。

J Chem Inf Model. 2023 Feb 27;63(4):1166-1176. doi: 10.1021/acs.jcim.2c01253. Epub 2023 Feb 15.

Data-Driven Approaches Used for Compound Library Design for the Treatment of Parkinson's Disease.基于数据驱动的方法用于治疗帕金森病的化合物库设计。

Int J Mol Sci. 2023 Jan 6;24(2):1134. doi: 10.3390/ijms24021134.

Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK).支架生成器：一个在化学开发工具包（CDK）中实现分子支架功能的Java库。

J Cheminform. 2022 Nov 10;14(1):79. doi: 10.1186/s13321-022-00656-x.

本文引用的文献

Computational method for estimating progression saturation of analog series.用于估计模拟序列进展饱和度的计算方法。

RSC Adv. 2018 Jan 31;8(10):5484-5492. doi: 10.1039/c7ra13748f. eCollection 2018 Jan 29.

DiaNat-DB: a molecular database of antidiabetic compounds from medicinal plants.DiaNat-DB：一个来自药用植物的抗糖尿病化合物分子数据库。

RSC Adv. 2021 Jan 28;11(9):5172-5178. doi: 10.1039/d0ra10453a. eCollection 2021 Jan 25.

R-group replacement database for medicinal chemistry.用于药物化学的R基团取代数据库。

Future Sci OA. 2021 Jun 30;7(8):FSO742. doi: 10.2144/fsoa-2021-0062. eCollection 2021 Sep.

Adapting the DeepSARM approach for dual-target ligand design.将 DeepSARM 方法用于双靶 ligands 的设计。

J Comput Aided Mol Des. 2021 May;35(5):587-600. doi: 10.1007/s10822-021-00379-5. Epub 2021 Mar 13.

SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

A general approach for retrosynthetic molecular core analysis.一种用于逆合成分子核心分析的通用方法。

J Cheminform. 2019 Sep 24;11(1):61. doi: 10.1186/s13321-019-0380-5.

CReM: chemically reasonable mutations framework for structure generation.CReM：用于结构生成的化学合理突变框架

J Cheminform. 2020 Apr 22;12(1):28. doi: 10.1186/s13321-020-00431-w.

QSAR-assisted-MMPA to expand chemical transformation space for lead optimization.QSAR 辅助 MMPA 拓展先导优化的化学转化空间。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa374.

Memory-assisted reinforcement learning for diverse molecular de novo design.用于多样分子从头设计的记忆辅助强化学习

J Cheminform. 2020 Nov 10;12(1):68. doi: 10.1186/s13321-020-00473-0.

Global Assessment of Substituents on the Basis of Analogue Series.基于类似物系列的取代基的全局评估。

J Med Chem. 2020 Dec 10;63(23):15013-15020. doi: 10.1021/acs.jmedchem.0c01607. Epub 2020 Nov 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自动识别大型化合物数据集的类似物系列：方法与应用。

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献