Suppr超能文献

引入一种具有化学直观性的核心取代基指纹,旨在探索有效相似度搜索和机器学习的结构要求。

Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning.

机构信息

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.

出版信息

Molecules. 2022 Apr 4;27(7):2331. doi: 10.3390/molecules27072331.

Abstract

Fingerprint (FP) representations of chemical structure continue to be one of the most widely used types of molecular descriptors in chemoinformatics and computational medicinal chemistry. One often distinguishes between two- and three-dimensional (2D and 3D) FPs depending on whether they are derived from molecular graphs or conformations, respectively. Primary application areas for FPs include similarity searching and compound classification via machine learning, especially for hit identification. For these applications, 2D FPs are particularly popular, given their robustness and for the most part comparable (or better) performance to 3D FPs. While a variety of FP prototypes has been designed and evaluated during earlier times of chemoinformatics research, new developments have been rare over the past decade. At least in part, this has been due to the situation that topological (atom environment) FPs derived from molecular graphs have evolved as a gold standard in the field. We were interested in exploring the question of whether the amount of structural information captured by state-of-the-art 2D FPs is indeed required for effective similarity searching and compound classification or whether accounting for fewer structural features might be sufficient. Therefore, pursuing a "structural minimalist" approach, we designed and implemented a new 2D FP based upon ring and substituent fragments obtained by systematically decomposing large numbers of compounds from medicinal chemistry. The resulting FP termed core-substituent FP (CSFP) captures much smaller numbers of structural features than state-of-the-art 2D FPs. However, CSFP achieves high performance in similarity searching and machine learning, demonstrating that less structural information is required for establishing molecular similarity relationships than is often believed. Given its high performance and chemical tangibility, CSFP is also relevant for practical applications in medicinal chemistry.

摘要

指纹(FP)表示化学结构仍然是化学生物信息学和计算药物化学中使用最广泛的分子描述符类型之一。根据它们是分别从分子图还是构象中导出的,通常将它们区分二维(2D)和三维(3D)FP。FP 的主要应用领域包括通过机器学习进行相似性搜索和化合物分类,特别是用于命中识别。对于这些应用,2D-FP 特别受欢迎,因为它们具有稳健性,并且在大多数情况下性能与 3D-FP 相当(或更好)。虽然在化学生物信息学研究的早期已经设计和评估了各种 FP 原型,但在过去十年中,新的发展很少。至少部分原因是,从分子图中衍生出的拓扑(原子环境)FP 已经成为该领域的黄金标准。我们有兴趣探讨这样一个问题,即最先进的 2D-FP 所捕获的结构信息量是否确实是有效相似性搜索和化合物分类所必需的,或者是否考虑较少的结构特征就足够了。因此,我们采用了一种“结构极简主义”的方法,设计并实现了一种新的 2D-FP,该 FP 基于通过系统分解大量药物化学化合物获得的环和取代基片段。由此产生的 FP 称为核心取代 FP(CSFP),它所捕获的结构特征数量比最先进的 2D-FP 要少得多。然而,CSFP 在相似性搜索和机器学习中表现出高性能,证明建立分子相似关系所需的结构信息量比人们通常认为的要少。鉴于其高性能和化学可理解性,CSFP 在药物化学的实际应用中也具有相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/7115fdbf40e8/molecules-27-02331-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验