Suppr超能文献

MolFCL:通过化学引导的对比学习和提示学习预测分子性质

MolFCL: predicting molecular properties through chemistry-guided contrastive and prompt learning.

作者信息

Tang Xiang, Zhao Qichang, Wang Jianxin, Duan Guihua

机构信息

Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.

出版信息

Bioinformatics. 2025 Feb 4;41(2). doi: 10.1093/bioinformatics/btaf061.

Abstract

MOTIVATION

Accurately identifying and predicting molecular properties is a crucial task in molecular machine learning, and the key lies in how to extract effective molecular representations. Contrastive learning opens new avenues for representation learning, and a large amount of unlabeled data enables the model to generalize to the huge chemical space. However, existing contrastive learning-based models face two challenges: (i) existing methods destroy the original molecular environment and ignore chemical prior information, and (ii) there is a lack of a prior knowledge to guide the prediction of molecular properties.

RESULTS

In this work, we propose a molecular property prediction framework called MolFCL, which consists of fragment-based contrastive learning and functional group-based prompt learning. Specifically, we introduced fragment-fragment interactions for the first time in the contrastive learning framework and designed a fragment-based augmented molecular graph that integrates the original chemical environment and fragment reactions. Furthermore, we proposed a novel functional group-based prompt learning during fine-tuning, which first incorporates functional group knowledge and the corresponding atomic signals, to improve molecular representation and provide interpretable analyses. The results show that MolFCL outperforms state-of-the-art baseline models on 23 molecular property prediction datasets. Moreover, visualizations show that MolFCL can learn to embed molecules into representations that can distinguish chemical properties. MolFCL can give higher weight to functional groups consistent with chemical knowledge during the prediction of molecular properties, which offers an interpretable ability of the model. Overall, MolFCL is a practically useful tool for molecular property prediction and assists drug scientists in designing drugs more effectively.

AVAILABILITY AND IMPLEMENTATION

MolFCL is available at https://github.com/tangxiangcsu/MolFCLSupplementary.

摘要

动机

准确识别和预测分子性质是分子机器学习中的一项关键任务,关键在于如何提取有效的分子表示。对比学习为表示学习开辟了新途径,大量未标记数据使模型能够推广到巨大的化学空间。然而,现有的基于对比学习的模型面临两个挑战:(i)现有方法破坏了原始分子环境并忽略了化学先验信息,(ii)缺乏先验知识来指导分子性质的预测。

结果

在这项工作中,我们提出了一个名为MolFCL的分子性质预测框架,它由基于片段的对比学习和基于官能团的提示学习组成。具体来说,我们首次在对比学习框架中引入了片段 - 片段相互作用,并设计了一种基于片段的增强分子图,该图整合了原始化学环境和片段反应。此外,我们在微调过程中提出了一种新颖的基于官能团的提示学习,它首先纳入官能团知识和相应的原子信号,以改善分子表示并提供可解释的分析。结果表明,MolFCL在23个分子性质预测数据集上优于现有最先进的基线模型。此外,可视化结果表明,MolFCL可以学习将分子嵌入到能够区分化学性质的表示中。MolFCL在预测分子性质时可以对与化学知识一致的官能团赋予更高的权重,这提供了模型的可解释能力。总体而言,MolFCL是一种用于分子性质预测的实用工具,可帮助药物科学家更有效地设计药物。

可用性和实现方式

MolFCL可在https://github.com/tangxiangcsu/MolFCLSupplementary上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e2cf/11878793/4abbe123bc41/btaf061f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验