• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

引入一种具有化学直观性的核心取代基指纹,旨在探索有效相似度搜索和机器学习的结构要求。

Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning.

机构信息

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.

出版信息

Molecules. 2022 Apr 4;27(7):2331. doi: 10.3390/molecules27072331.

DOI:10.3390/molecules27072331
PMID:35408730
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9000322/
Abstract

Fingerprint (FP) representations of chemical structure continue to be one of the most widely used types of molecular descriptors in chemoinformatics and computational medicinal chemistry. One often distinguishes between two- and three-dimensional (2D and 3D) FPs depending on whether they are derived from molecular graphs or conformations, respectively. Primary application areas for FPs include similarity searching and compound classification via machine learning, especially for hit identification. For these applications, 2D FPs are particularly popular, given their robustness and for the most part comparable (or better) performance to 3D FPs. While a variety of FP prototypes has been designed and evaluated during earlier times of chemoinformatics research, new developments have been rare over the past decade. At least in part, this has been due to the situation that topological (atom environment) FPs derived from molecular graphs have evolved as a gold standard in the field. We were interested in exploring the question of whether the amount of structural information captured by state-of-the-art 2D FPs is indeed required for effective similarity searching and compound classification or whether accounting for fewer structural features might be sufficient. Therefore, pursuing a "structural minimalist" approach, we designed and implemented a new 2D FP based upon ring and substituent fragments obtained by systematically decomposing large numbers of compounds from medicinal chemistry. The resulting FP termed core-substituent FP (CSFP) captures much smaller numbers of structural features than state-of-the-art 2D FPs. However, CSFP achieves high performance in similarity searching and machine learning, demonstrating that less structural information is required for establishing molecular similarity relationships than is often believed. Given its high performance and chemical tangibility, CSFP is also relevant for practical applications in medicinal chemistry.

摘要

指纹(FP)表示化学结构仍然是化学生物信息学和计算药物化学中使用最广泛的分子描述符类型之一。根据它们是分别从分子图还是构象中导出的,通常将它们区分二维(2D)和三维(3D)FP。FP 的主要应用领域包括通过机器学习进行相似性搜索和化合物分类,特别是用于命中识别。对于这些应用,2D-FP 特别受欢迎,因为它们具有稳健性,并且在大多数情况下性能与 3D-FP 相当(或更好)。虽然在化学生物信息学研究的早期已经设计和评估了各种 FP 原型,但在过去十年中,新的发展很少。至少部分原因是,从分子图中衍生出的拓扑(原子环境)FP 已经成为该领域的黄金标准。我们有兴趣探讨这样一个问题,即最先进的 2D-FP 所捕获的结构信息量是否确实是有效相似性搜索和化合物分类所必需的,或者是否考虑较少的结构特征就足够了。因此,我们采用了一种“结构极简主义”的方法,设计并实现了一种新的 2D-FP,该 FP 基于通过系统分解大量药物化学化合物获得的环和取代基片段。由此产生的 FP 称为核心取代 FP(CSFP),它所捕获的结构特征数量比最先进的 2D-FP 要少得多。然而,CSFP 在相似性搜索和机器学习中表现出高性能,证明建立分子相似关系所需的结构信息量比人们通常认为的要少。鉴于其高性能和化学可理解性,CSFP 在药物化学的实际应用中也具有相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/c97fd313c81d/molecules-27-02331-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/7115fdbf40e8/molecules-27-02331-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/b4270c3d1c50/molecules-27-02331-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/c55cc4126287/molecules-27-02331-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/a0e903b7e2e1/molecules-27-02331-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/8ba22d16360f/molecules-27-02331-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/c97fd313c81d/molecules-27-02331-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/7115fdbf40e8/molecules-27-02331-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/b4270c3d1c50/molecules-27-02331-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/c55cc4126287/molecules-27-02331-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/a0e903b7e2e1/molecules-27-02331-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/8ba22d16360f/molecules-27-02331-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/016d/9000322/c97fd313c81d/molecules-27-02331-g006.jpg

相似文献

1
Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning.引入一种具有化学直观性的核心取代基指纹,旨在探索有效相似度搜索和机器学习的结构要求。
Molecules. 2022 Apr 4;27(7):2331. doi: 10.3390/molecules27072331.
2
Evaluation of different virtual screening strategies on the basis of compound sets with characteristic core distributions and dissimilarity relationships.基于具有特征核心分布和差异关系的化合物集评估不同的虚拟筛选策略。
J Comput Aided Mol Des. 2019 Aug;33(8):729-743. doi: 10.1007/s10822-019-00218-8. Epub 2019 Aug 21.
3
Fingerprint design and engineering strategies: rationalizing and improving similarity search performance.指纹设计与工程策略:优化和提高相似性搜索性能。
Future Med Chem. 2012 Oct;4(15):1945-59. doi: 10.4155/fmc.12.126.
4
Design and evaluation of a novel class-directed 2D fingerprint to search for structurally diverse active compounds.一种新型类别导向二维指纹图谱的设计与评估,用于搜索结构多样的活性化合物。
J Chem Inf Model. 2006 Nov-Dec;46(6):2515-26. doi: 10.1021/ci600303b.
5
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.使用原子对三维指纹对ZINC数据库进行立体选择性虚拟筛选。
J Cheminform. 2015 Feb 10;7:3. doi: 10.1186/s13321-014-0051-5. eCollection 2015.
6
Advances in 2D fingerprint similarity searching.二维指纹相似性搜索的进展。
Expert Opin Drug Discov. 2010 Jun;5(6):529-42. doi: 10.1517/17460441.2010.486830. Epub 2010 Apr 29.
7
Molecular fingerprint recombination: generating hybrid fingerprints for similarity searching from different fingerprint types.分子指纹重组:从不同的指纹类型生成用于相似性搜索的混合指纹。
ChemMedChem. 2009 Nov;4(11):1859-63. doi: 10.1002/cmdc.200900243.
8
Molecular crime scene investigation - dusting for fingerprints.分子犯罪现场调查——指纹提取。
Drug Discov Today Technol. 2013 Dec;10(4):e491-8. doi: 10.1016/j.ddtec.2012.06.003.
9
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints.jCompoundMapper:一个用于化学指纹的开源 Java 库和命令行工具。
J Cheminform. 2011 Jan 10;3(1):3. doi: 10.1186/1758-2946-3-3.
10
How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection.2D 指纹如何检测结构多样的活性化合物?通过系统选择揭示化合物子集特异性指纹特征。
J Chem Inf Model. 2011 Sep 26;51(9):2254-65. doi: 10.1021/ci200275m. Epub 2011 Aug 8.

引用本文的文献

1
Identification of Potential FDA-Approved Inhibitors of SARS-CoV-2 Helicase Through a Multistep Approach: A Promising Prospect for COVID-19 Treatment.通过多步骤方法鉴定潜在的FDA批准的SARS-CoV-2解旋酶抑制剂:COVID-19治疗的一个有前景的前景。
Med Chem. 2025;21(5):425-441. doi: 10.2174/0115734064318640241112071225.
2
Development and Validation of Atomic Group Descriptors for Substituent Effects.取代基效应的原子基团描述符的开发与验证
J Comput Chem. 2025 May 30;46(14):e70131. doi: 10.1002/jcc.70131.
3
SIGMAP: an explainable artificial intelligence tool for SIGMA-1 receptor affinity prediction.

本文引用的文献

1
R-group replacement database for medicinal chemistry.用于药物化学的R基团取代数据库。
Future Sci OA. 2021 Jun 30;7(8):FSO742. doi: 10.2144/fsoa-2021-0062. eCollection 2021 Sep.
2
Global Assessment of Substituents on the Basis of Analogue Series.基于类似物系列的取代基的全局评估。
J Med Chem. 2020 Dec 10;63(23):15013-15020. doi: 10.1021/acs.jmedchem.0c01607. Epub 2020 Nov 30.
3
Systematic Extraction of Analogue Series from Large Compound Collections Using a New Computational Compound-Core Relationship Method.使用一种新的计算化合物-核心关系方法从大型化合物库中系统提取类似物系列
SIGMAP:一种用于西格玛-1受体亲和力预测的可解释人工智能工具。
RSC Med Chem. 2024 Nov 8;16(2):835-848. doi: 10.1039/d4md00722k. eCollection 2025 Feb 19.
4
TIRESIA and TISBE: Explainable Artificial Intelligence Based Web Platforms for the Transparent Assessment of the Developmental Toxicity of Chemicals and Drugs.TIRESIA 和 TISBE:基于可解释人工智能的网络平台,用于透明评估化学品和药物的发育毒性。
Methods Mol Biol. 2025;2834:373-391. doi: 10.1007/978-1-0716-4003-6_18.
ACS Omega. 2019 Jan 14;4(1):1027-1032. doi: 10.1021/acsomega.8b03390. eCollection 2019 Jan 31.
4
An overview of molecular fingerprint similarity search in virtual screening.虚拟筛选中分子指纹相似性搜索概述
Expert Opin Drug Discov. 2016;11(2):137-48. doi: 10.1517/17460441.2016.1117070. Epub 2015 Dec 4.
5
An Aggregation Advisor for Ligand Discovery.用于配体发现的聚集顾问程序。
J Med Chem. 2015 Sep 10;58(17):7076-87. doi: 10.1021/acs.jmedchem.5b01105. Epub 2015 Aug 28.
6
Molecular fingerprint similarity search in virtual screening.虚拟筛选中的分子指纹相似性搜索。
Methods. 2015 Jan;71:58-63. doi: 10.1016/j.ymeth.2014.08.005. Epub 2014 Aug 15.
7
Atom pair 2D-fingerprints perceive 3D-molecular shape and pharmacophores for very fast virtual screening of ZINC and GDB-17.原子对二维指纹图谱可识别三维分子形状和药效基团,用于对ZINC和GDB - 17进行非常快速的虚拟筛选。
J Chem Inf Model. 2014 Jul 28;54(7):1892-907. doi: 10.1021/ci500232g. Epub 2014 Jul 2.
8
The ChEMBL bioactivity database: an update.《ChEMBL 生物活性数据库更新》
Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90. doi: 10.1093/nar/gkt1031. Epub 2013 Nov 7.
9
Molecular similarity in medicinal chemistry.药物化学中的分子相似性。
J Med Chem. 2014 Apr 24;57(8):3186-204. doi: 10.1021/jm401411z. Epub 2013 Nov 11.
10
Rules for identifying potentially reactive or promiscuous compounds.潜在反应性或混杂化合物的鉴定规则。
J Med Chem. 2012 Nov 26;55(22):9763-72. doi: 10.1021/jm301008n. Epub 2012 Oct 25.