• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

预训练模型真的能为 AI 辅助药物发现学习更好的分子表示吗?

Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery?

机构信息

Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China.

Tencent AI Lab, Shenzhen 518063, China.

出版信息

J Chem Inf Model. 2024 Apr 8;64(7):2921-2930. doi: 10.1021/acs.jcim.3c01707. Epub 2023 Dec 25.

DOI:10.1021/acs.jcim.3c01707
PMID:38145387
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11005046/
Abstract

Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named presentation-roperty elationship nalysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.

摘要

自监督预训练模型在人工智能辅助药物发现中越来越受欢迎,导致越来越多的预训练模型承诺能够为分子提取更好的特征表示。然而,学习表示的质量尚未得到充分探索。在这项工作中,受传统定量构效关系分析中活性悬崖(ACs)和支架跳跃(SH)两种现象的启发,我们提出了一种名为表示-性质关系分析(RePRA)的方法,用于评估预训练模型提取的表示质量,并可视化表示和性质之间的关系。ACs 和 SH 的概念从结构-活性上下文推广到表示-性质上下文,并且从理论上分析了 RePRA 的基本原理。设计了两个分数来衡量 RePRA 检测到的广义 ACs 和 SH,从而可以评估表示的质量。在实验中,分析了 7 个预训练模型生成的 10 个目标任务的分子表示。结果表明,最先进的预训练模型可以克服经典扩展连接指纹的一些缺点,而表示空间的基与特定分子亚结构之间的相关性并不明显。因此,一些表示甚至可能比经典指纹更差。我们的方法使研究人员能够评估他们提出的自监督预训练模型生成的分子表示的质量。并且我们的发现可以指导社区开发更好的预训练技术来规范 ACs 和 SH 的发生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/74da68b9ed8d/ci3c01707_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/a1c3017bfba5/ci3c01707_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/fea3e96e2ddc/ci3c01707_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/578fa63fb6ed/ci3c01707_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/1fb39c3d8c27/ci3c01707_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/6926c67d0f92/ci3c01707_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/74da68b9ed8d/ci3c01707_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/a1c3017bfba5/ci3c01707_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/fea3e96e2ddc/ci3c01707_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/578fa63fb6ed/ci3c01707_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/1fb39c3d8c27/ci3c01707_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/6926c67d0f92/ci3c01707_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9ca/11005046/74da68b9ed8d/ci3c01707_0006.jpg

相似文献

1
Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery?预训练模型真的能为 AI 辅助药物发现学习更好的分子表示吗?
J Chem Inf Model. 2024 Apr 8;64(7):2921-2930. doi: 10.1021/acs.jcim.3c01707. Epub 2023 Dec 25.
2
Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction.基于化学感知碎裂的自监督学习可有效预测分子性质。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad296.
3
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction.MG-BERT:利用无监督原子表示学习进行分子性质预测。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab152.
4
Enhancing drug property prediction with dual-channel transfer learning based on molecular fragment.基于分子片段的双通道迁移学习增强药物性质预测
BMC Bioinformatics. 2023 Jul 21;24(1):293. doi: 10.1186/s12859-023-05413-x.
5
A knowledge-guided pre-training framework for improving molecular representation learning.一种基于知识引导的预训练框架,用于改进分子表示学习。
Nat Commun. 2023 Nov 21;14(1):7568. doi: 10.1038/s41467-023-43214-1.
6
FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction.FG-BERT:一种用于性质预测的通用的、基于自监督的官能团分子表示学习框架。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad398.
7
Learning Generalized Transformation Equivariant Representations Via AutoEncoding Transformations.通过自动编码变换学习广义变换等变表示。
IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2045-2057. doi: 10.1109/TPAMI.2020.3029801. Epub 2022 Mar 4.
8
Enhancing molecular property prediction with auxiliary learning and task-specific adaptation.通过辅助学习和特定任务适应增强分子性质预测。
J Cheminform. 2024 Jul 24;16(1):85. doi: 10.1186/s13321-024-00880-7.
9
Depression Risk Prediction for Chinese Microblogs via Deep-Learning Methods: Content Analysis.基于深度学习方法的中文微博抑郁风险预测:内容分析
JMIR Med Inform. 2020 Jul 29;8(7):e17958. doi: 10.2196/17958.
10
Using molecular embeddings in QSAR modeling: does it make a difference?在定量构效关系建模中使用分子嵌入:有区别吗?
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab365.

引用本文的文献

1
The future of pharmaceuticals: Artificial intelligence in drug discovery and development.制药的未来:药物研发中的人工智能
J Pharm Anal. 2025 Aug;15(8):101248. doi: 10.1016/j.jpha.2025.101248. Epub 2025 Feb 26.
2
ACES-GNN: can graph neural network learn to explain activity cliffs?ACES-GNN:图神经网络能学会解释活性断崖吗?
Digit Discov. 2025 Jun 30. doi: 10.1039/d5dd00012b.
3
Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation.用于将结构层次整合到上下文相关分子表示中的多通道学习。

本文引用的文献

1
Extracting Predictive Representations from Hundreds of Millions of Molecules.从数亿个分子中提取预测表示。
J Phys Chem Lett. 2021 Nov 11;12(44):10793-10801. doi: 10.1021/acs.jpclett.1c03058. Epub 2021 Nov 1.
2
A merged molecular representation learning for molecular properties prediction with a web-based service.基于网络服务的分子性质预测的融合分子表示学习。
Sci Rep. 2021 May 26;11(1):11028. doi: 10.1038/s41598-021-90259-7.
3
FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction.
Nat Commun. 2025 Jan 6;16(1):413. doi: 10.1038/s41467-024-55082-4.
4
PTB-DDI: An Accurate and Simple Framework for Drug-Drug Interaction Prediction Based on Pre-Trained Tokenizer and BiLSTM Model.PTB-DDI:基于预训练分词器和 BiLSTM 模型的准确且简单的药物相互作用预测框架。
Int J Mol Sci. 2024 Oct 23;25(21):11385. doi: 10.3390/ijms252111385.
5
Bidirectional generation of structure and properties through a single molecular foundation model.通过单一分子基础模型实现结构与性质的双向生成。
Nat Commun. 2024 Mar 14;15(1):2323. doi: 10.1038/s41467-024-46440-3.
FraGAT:一种面向片段的多尺度图注意力模型,用于分子性质预测。
Bioinformatics. 2021 Sep 29;37(18):2981-2987. doi: 10.1093/bioinformatics/btab195.
4
Using Domain-Specific Fingerprints Generated Through Neural Networks to Enhance Ligand-Based Virtual Screening.利用神经网络生成的领域特定指纹增强基于配体的虚拟筛选。
J Chem Inf Model. 2021 Feb 22;61(2):664-675. doi: 10.1021/acs.jcim.0c01208. Epub 2021 Jan 26.
5
MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery.MDeePred:用于药物发现中基于深度学习的结合亲和力预测的新型多通道蛋白质特征化。
Bioinformatics. 2021 May 5;37(5):693-704. doi: 10.1093/bioinformatics/btaa858.
6
TOP: A deep mixture representation learning method for boosting molecular toxicity prediction.标题:一种用于提升分子毒性预测的深度混合表示学习方法。
Methods. 2020 Jul 1;179:55-64. doi: 10.1016/j.ymeth.2020.05.013. Epub 2020 May 21.
7
Rethinking drug design in the artificial intelligence era.人工智能时代的药物设计再思考。
Nat Rev Drug Discov. 2020 May;19(5):353-364. doi: 10.1038/s41573-019-0050-3. Epub 2019 Dec 4.
8
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism.利用图注意力机制拓展药物发现中分子表示的边界。
J Med Chem. 2020 Aug 27;63(16):8749-8760. doi: 10.1021/acs.jmedchem.9b00959. Epub 2019 Aug 27.
9
ReSimNet: drug response similarity prediction using Siamese neural networks.ReSimNet:基于孪生神经网络的药物反应相似性预测
Bioinformatics. 2019 Dec 15;35(24):5249-5256. doi: 10.1093/bioinformatics/btz411.
10
MoleculeNet: a benchmark for molecular machine learning.分子网络:分子机器学习的一个基准
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.