• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分子指纹图谱在探索天然产物化学空间方面的有效性。

Effectiveness of molecular fingerprints for exploring the chemical space of natural products.

作者信息

Boldini Davide, Ballabio Davide, Consonni Viviana, Todeschini Roberto, Grisoni Francesca, Sieber Stephan A

机构信息

TUM School of Natural Sciences, Department of Bioscience, Technical University of Munich, Center for Functional Protein Assemblies (CPA), 85748, Garching bei München, Germany.

Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.zza Della Scienza, 1, 20126, Milan, Italy.

出版信息

J Cheminform. 2024 Mar 25;16(1):35. doi: 10.1186/s13321-024-00830-3.

DOI:10.1186/s13321-024-00830-3
PMID:38528548
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10964529/
Abstract

Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .

摘要

天然产物是一类具有多样化学结构的化合物,具有如高效能和优异选择性等有前景的生物学特性。然而,它们与典型的类药物化合物具有不同的结构基序,例如分子量范围更广、多个立体中心以及更高比例的sp杂化碳。这使得通过分子指纹对天然产物进行编码变得困难,从而限制了它们在化学信息学研究中的应用。为了解决这个问题,我们对30多年的研究进行了系统评估,以确定哪种分子指纹在天然产物化学空间中表现最佳。我们考虑了来自四个不同来源的20种分子指纹,然后在来自COCONUT(开放天然产物集合)和CMNPD(综合海洋天然产物数据库)数据库的超过100,000种独特天然产物上进行了基准测试。我们的分析重点关注不同指纹之间的相关性及其在12个生物活性预测数据集上的分类性能。我们的结果表明,不同的编码方式可以提供对天然产物化学空间截然不同的视图,导致成对相似性和性能存在显著差异。虽然扩展连接指纹是编码类药物化合物的实际选择,但其他指纹在天然产物的生物活性预测方面与之相当或表现更优。这些结果凸显了评估多种指纹算法以实现最佳性能的必要性,并提出了新的研究领域。最后,我们在https://github.com/dahvida/NP_Fingerprints上提供了一个开源Python包,用于计算本研究中考虑的所有分子指纹,以及重现结果所需的数据和脚本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/f64ff79467a1/13321_2024_830_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/46f6f8f780e7/13321_2024_830_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/c662827cc4d8/13321_2024_830_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/9472d65a652b/13321_2024_830_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/f64ff79467a1/13321_2024_830_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/46f6f8f780e7/13321_2024_830_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/c662827cc4d8/13321_2024_830_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/9472d65a652b/13321_2024_830_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c135/10964529/f64ff79467a1/13321_2024_830_Fig4_HTML.jpg

相似文献

1
Effectiveness of molecular fingerprints for exploring the chemical space of natural products.分子指纹图谱在探索天然产物化学空间方面的有效性。
J Cheminform. 2024 Mar 25;16(1):35. doi: 10.1186/s13321-024-00830-3.
2
jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints.jCompoundMapper:一个用于化学指纹的开源 Java 库和命令行工具。
J Cheminform. 2011 Jan 10;3(1):3. doi: 10.1186/1758-2946-3-3.
3
Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development.利用天然产物词典(DNP)开发用于基于天然产物的药物研发的天然化合物分子指纹(NC-MFP)。
J Cheminform. 2020 Jan 22;12(1):6. doi: 10.1186/s13321-020-0410-3.
4
One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.一种分子指纹统御万物:药物、生物分子与代谢组。
J Cheminform. 2020 Jun 12;12(1):43. doi: 10.1186/s13321-020-00445-4.
5
Chemical Space, Scaffolds, and Halogenated Compounds of CMNPD: A Comprehensive Chemoinformatic Analysis.CMNPD的化学空间、骨架与卤代化合物:全面的化学信息学分析
J Chem Inf Model. 2021 Jul 26;61(7):3323-3336. doi: 10.1021/acs.jcim.1c00162. Epub 2021 Jun 22.
6
Prioritizing Virtual Screening with Interpretable Interaction Fingerprints.基于可解释相互作用指纹的虚拟筛选优先级排序。
J Chem Inf Model. 2022 Sep 26;62(18):4300-4318. doi: 10.1021/acs.jcim.2c00695. Epub 2022 Sep 14.
7
Using Domain-Specific Fingerprints Generated Through Neural Networks to Enhance Ligand-Based Virtual Screening.利用神经网络生成的领域特定指纹增强基于配体的虚拟筛选。
J Chem Inf Model. 2021 Feb 22;61(2):664-675. doi: 10.1021/acs.jcim.0c01208. Epub 2021 Jan 26.
8
Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm.基于假设结构枚举算法的模块化天然产物化学相似性方法的比较分析
J Cheminform. 2017 Aug 16;9(1):46. doi: 10.1186/s13321-017-0234-y.
9
Natural product scores and fingerprints extracted from artificial neural networks.从人工神经网络中提取的天然产物得分和指纹图谱。
Comput Struct Biotechnol J. 2021 Jul 30;19:4593-4602. doi: 10.1016/j.csbj.2021.07.032. eCollection 2021.
10
Comparing structural fingerprints using a literature-based similarity benchmark.使用基于文献的相似性基准比较结构指纹。
J Cheminform. 2016 Jul 5;8:36. doi: 10.1186/s13321-016-0148-0. eCollection 2016.

引用本文的文献

1
AI-assisted discovery of potent FGFR1 inhibitors via virtual screening and in silico analysis.通过虚拟筛选和计算机分析实现人工智能辅助发现强效FGFR1抑制剂。
PLoS One. 2025 Sep 11;20(9):e0331837. doi: 10.1371/journal.pone.0331837. eCollection 2025.
2
Mixture of experts for multitask learning in cardiotoxicity assessment.用于心脏毒性评估中多任务学习的专家混合模型。
J Cheminform. 2025 Aug 29;17(1):135. doi: 10.1186/s13321-025-01072-7.
3
Exploring the anticancer potential of nitrated N-substituted-4-hydroxy-2-quinolone-3-carboxamides: synthesis, biological assessment, and computational analysis.

本文引用的文献

1
Artificial intelligence for natural product drug discovery.人工智能在天然产物药物发现中的应用。
Nat Rev Drug Discov. 2023 Nov;22(11):895-916. doi: 10.1038/s41573-023-00774-7. Epub 2023 Sep 11.
2
Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。
J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.
3
Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.使用COCONUT数据库和机器学习对来自植物、真菌或细菌的天然产物进行分类。
探索硝化 N-取代-4-羟基-2-喹诺酮-3-甲酰胺的抗癌潜力:合成、生物学评估及计算分析
BMC Chem. 2025 Aug 22;19(1):247. doi: 10.1186/s13065-025-01616-w.
4
NfκBin: a machine learning based method for screening TNF-α induced NF-κB inhibitors.NfκBin:一种基于机器学习的筛选肿瘤坏死因子-α诱导的核因子-κB抑制剂的方法。
Front Bioinform. 2025 Jul 17;5:1573744. doi: 10.3389/fbinf.2025.1573744. eCollection 2025.
5
The topology of molecular representations and its influence on machine learning performance.分子表示的拓扑结构及其对机器学习性能的影响。
J Cheminform. 2025 Jul 21;17(1):109. doi: 10.1186/s13321-025-01045-w.
6
MIC: A deep learning tool for assigning ions and waters in cryo-EM and crystal structures.MIC:一种用于在冷冻电镜和晶体结构中确定离子和水分子位置的深度学习工具。
Nat Commun. 2025 Jul 4;16(1):6182. doi: 10.1038/s41467-025-61315-x.
7
A million shades of green: understanding and harnessing plant metabolic diversity.绿色的百万种色调:理解与利用植物代谢多样性
EMBO J. 2025 Jul 3. doi: 10.1038/s44318-025-00496-z.
8
Machine Learning for Toxicity Prediction Using Chemical Structures: Pillars for Success in the Real World.利用化学结构进行毒性预测的机器学习:在现实世界中取得成功的支柱。
Chem Res Toxicol. 2025 May 19;38(5):759-807. doi: 10.1021/acs.chemrestox.5c00033. Epub 2025 May 2.
9
Growth vs. Diversity: A Time-Evolution Analysis of the Chemical Space.增长与多样性:化学空间的时间演化分析
bioRxiv. 2025 Feb 23:2025.02.18.638937. doi: 10.1101/2025.02.18.638937.
10
Artificial Intelligence in Natural Product Drug Discovery: Current Applications and Future Perspectives.天然产物药物发现中的人工智能:当前应用与未来展望。
J Med Chem. 2025 Feb 27;68(4):3948-3969. doi: 10.1021/acs.jmedchem.4c01257. Epub 2025 Feb 6.
J Cheminform. 2021 Oct 18;13(1):82. doi: 10.1186/s13321-021-00559-3.
4
Learning from Nature: From a Marine Natural Product to Synthetic Cyclooxygenase-1 Inhibitors by Automated De Novo Design.从海洋天然产物到合成环氧化酶-1 抑制剂:自动化从头设计的启示。
Adv Sci (Weinh). 2021 Aug;8(16):e2100832. doi: 10.1002/advs.202100832. Epub 2021 Jun 27.
5
Natural products in drug discovery: advances and opportunities.天然产物在药物发现中的应用:进展与机遇。
Nat Rev Drug Discov. 2021 Mar;20(3):200-216. doi: 10.1038/s41573-020-00114-z. Epub 2021 Jan 28.
6
An open source chemical structure curation pipeline using RDKit.一个使用RDKit的开源化学结构编目流程。
J Cheminform. 2020 Sep 1;12(1):51. doi: 10.1186/s13321-020-00456-1.
7
Review on natural products databases: where to find data in 2020.天然产物数据库综述:2020年何处获取数据
J Cheminform. 2020 Apr 3;12(1):20. doi: 10.1186/s13321-020-00424-9.
8
One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome.一种分子指纹统御万物:药物、生物分子与代谢组。
J Cheminform. 2020 Jun 12;12(1):43. doi: 10.1186/s13321-020-00445-4.
9
Development of Natural Compound Molecular Fingerprint (NC-MFP) with the Dictionary of Natural Products (DNP) for natural product-based drug development.利用天然产物词典(DNP)开发用于基于天然产物的药物研发的天然化合物分子指纹(NC-MFP)。
J Cheminform. 2020 Jan 22;12(1):6. doi: 10.1186/s13321-020-0410-3.
10
COCONUT online: Collection of Open Natural Products database.COCONUT在线:开放天然产物数据库集合。
J Cheminform. 2021 Jan 10;13(1):2. doi: 10.1186/s13321-020-00478-9.