• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于上下文的小分子片段相似性搜索

Context-dependent similarity searching for small molecular fragments.

作者信息

Yoshimori Atsushi, Bajorath Jürgen

机构信息

Institute for Theoretical Medicine, Inc., 26-1 Muraoka-Higashi 2-Chome, Fujisawa, Kanagawa, 251-0012, Japan.

Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, University of Bonn, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.

出版信息

J Cheminform. 2025 May 26;17(1):83. doi: 10.1186/s13321-025-01032-1.

DOI:10.1186/s13321-025-01032-1
PMID:40420123
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12107754/
Abstract

Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure-activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure-property contexts can be established, providing opportunities for various applications.

摘要

相似性搜索是化学信息学的一项主要技术,通常用于识别具有所需特性的化合物。对于小分子片段,由于特征稀疏,基于标准描述符的相似性计算在建立有意义的相似性关系方面往往效用有限。作为一种替代方法,我们借鉴了自然语言处理中上下文相关词对相似性的概念,以评估取代基(R基团)之间的相似性关系,并考虑潜在特征。上下文相关相似性评估基于使用神经网络生成的向量嵌入作为片段表示。以活性类似物系列作为建立全局结构-活性上下文的模型系统,我们证明了这种方法适用于取代基的系统相似性搜索,并提高了标准描述符表示的性能。上下文相关相似性搜索能够检测取代基之间遥远且功能相关的相似性关系。引入了替代搜索查询,重点关注全局取代基上下文中的单个取代基或建立局部上下文的取代基的单个序列。对于相似性搜索,可以建立不同的结构或结构-性质上下文,为各种应用提供机会。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/58edd5ae89e7/13321_2025_1032_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f1394a3dfa68/13321_2025_1032_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f16e362e8959/13321_2025_1032_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/5951efb54193/13321_2025_1032_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f05d32834544/13321_2025_1032_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/1bd0676d4a8f/13321_2025_1032_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/9e6ce9698877/13321_2025_1032_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/58edd5ae89e7/13321_2025_1032_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f1394a3dfa68/13321_2025_1032_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f16e362e8959/13321_2025_1032_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/5951efb54193/13321_2025_1032_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/f05d32834544/13321_2025_1032_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/1bd0676d4a8f/13321_2025_1032_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/9e6ce9698877/13321_2025_1032_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a3/12107754/58edd5ae89e7/13321_2025_1032_Fig7_HTML.jpg

相似文献

1
Context-dependent similarity searching for small molecular fragments.基于上下文的小分子片段相似性搜索
J Cheminform. 2025 May 26;17(1):83. doi: 10.1186/s13321-025-01032-1.
2
Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.基于自然语言处理概念的类似物系列结构-活性关系转移的上下文相关相似性分析
J Cheminform. 2025 Jan 15;17(1):5. doi: 10.1186/s13321-025-00951-3.
3
Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks.基于深度置信网络堆叠的改进深度学习分子相似性搜索方法。
Molecules. 2020 Dec 29;26(1):128. doi: 10.3390/molecules26010128.
4
Activity-relevant similarity values for fingerprints and implications for similarity searching.指纹的活动相关相似性值及其对相似性搜索的影响。
F1000Res. 2016 Apr 6;5. doi: 10.12688/f1000research.8357.2. eCollection 2016.
5
Introducing a Chemically Intuitive Core-Substituent Fingerprint Designed to Explore Structural Requirements for Effective Similarity Searching and Machine Learning.引入一种具有化学直观性的核心取代基指纹,旨在探索有效相似度搜索和机器学习的结构要求。
Molecules. 2022 Apr 4;27(7):2331. doi: 10.3390/molecules27072331.
6
Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients.二维基于片段的相似性搜索中相似性度量的性能:结构描述符和相似性系数的比较
J Chem Inf Comput Sci. 2002 Nov-Dec;42(6):1407-14. doi: 10.1021/ci025531g.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.结合性质描述符和分子指纹的高维化学空间中活性化合物的贝叶斯筛选
Chem Biol Drug Des. 2008 Jan;71(1):8-14. doi: 10.1111/j.1747-0285.2007.00602.x. Epub 2007 Dec 7.
9
Molecular Similarity Concepts for Informatics Applications.用于信息学应用的分子相似性概念
Methods Mol Biol. 2017;1526:231-245. doi: 10.1007/978-1-4939-6613-4_13.
10
Chemical database mining through entropy-based molecular similarity assessment of randomly generated structural fragment populations.通过对随机生成的结构片段群体进行基于熵的分子相似性评估来进行化学数据库挖掘。
J Chem Inf Model. 2007 Jan-Feb;47(1):59-68. doi: 10.1021/ci600377m.

本文引用的文献

1
Context-dependent similarity analysis of analogue series for structure-activity relationship transfer based on a concept from natural language processing.基于自然语言处理概念的类似物系列结构-活性关系转移的上下文相关相似性分析
J Cheminform. 2025 Jan 15;17(1):5. doi: 10.1186/s13321-025-00951-3.
2
Computational method for the systematic alignment of analogue series with structure-activity relationship transfer potential across different targets.计算方法用于在不同靶标之间系统地对齐具有结构-活性关系传递潜力的类似物系列。
Eur J Med Chem. 2022 Sep 5;239:114558. doi: 10.1016/j.ejmech.2022.114558. Epub 2022 Jun 23.
3
Informatics for Chemistry, Biology, and Biomedical Sciences.
化学、生物学和生物医学科学信息学。
J Chem Inf Model. 2021 Jan 25;61(1):26-35. doi: 10.1021/acs.jcim.0c01301. Epub 2020 Dec 31.
4
Systematic Extraction of Analogue Series from Large Compound Collections Using a New Computational Compound-Core Relationship Method.使用一种新的计算化合物-核心关系方法从大型化合物库中系统提取类似物系列
ACS Omega. 2019 Jan 14;4(1):1027-1032. doi: 10.1021/acsomega.8b03390. eCollection 2019 Jan 31.
5
Advances in natural language processing.自然语言处理的进展。
Science. 2015 Jul 17;349(6245):261-6. doi: 10.1126/science.aaa8685.
6
Molecular similarity in medicinal chemistry.药物化学中的分子相似性。
J Med Chem. 2014 Apr 24;57(8):3186-204. doi: 10.1021/jm401411z. Epub 2013 Nov 11.
7
Large-scale SAR analysis.大规模合成孔径雷达分析。
Drug Discov Today Technol. 2013 Sep;10(3):e419-26. doi: 10.1016/j.ddtec.2013.01.002.
8
MQN-mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13.MQN-mapplet:使用 DrugBank、ChEMBL、PubChem、GDB-11 和 GDB-13 的交互式图谱可视化化学空间。
J Chem Inf Model. 2013 Feb 25;53(2):509-18. doi: 10.1021/ci300513m. Epub 2013 Jan 22.
9
Systematic assessment of compound series with SAR transfer potential.具有 SAR 转移潜力的化合物系列的系统评估。
J Chem Inf Model. 2012 Dec 21;52(12):3138-43. doi: 10.1021/ci300481d. Epub 2012 Dec 6.
10
ChEMBL: a large-scale bioactivity database for drug discovery.ChEMBL:用于药物发现的大型生物活性数据库。
Nucleic Acids Res. 2012 Jan;40(Database issue):D1100-7. doi: 10.1093/nar/gkr777. Epub 2011 Sep 23.