Suppr超能文献

自动识别大型化合物数据集的类似物系列:方法与应用。

Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications.

机构信息

Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany.

出版信息

Molecules. 2021 Aug 31;26(17):5291. doi: 10.3390/molecules26175291.

Abstract

Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.

摘要

类似物系列在药物发现中起着关键作用。它们自然出现在先导化合物优化工作中,其中根据一个或几个核心结构来探索类似物。然而,在没有预定义核心结构的大型化合物数据库中,更难准确识别和提取成对或系列的类似分子。本方法学综述概述了最常见和最新的方法学进展,用于自动识别大型库中的类似物系列。最初的方法侧重于使用预定义规则来提取支架结构,例如流行的 Bemis-Murcko 支架。后来,匹配分子对的概念导致了有效的算法,通过探索每个化合物的许多可能的支架来识别具有共同核心结构的相似化合物。这些想法的进一步发展一方面产生了分层支架分解的方法,另一方面产生了基于单一位点修饰(所谓的匹配分子系列)提取类似物系列的算法,通过基于系统分子断裂来探索潜在的支架结构。最终,这些方法的进一步发展导致了定义为具有几个取代位点的单个核心结构的类似物系列的提取方法,这些方法允许方便的表示,例如 R 基团表。这些方法能够高效地分析包含数十万甚至数百万种化合物的大型数据集,并催生了许多相关的方法学发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a222/8433811/bc173f928606/molecules-26-05291-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验