Suppr超能文献

GDMol:用于分子性质预测的生成式双掩码自监督学习

GDMol: Generative Double-Masking Self-Supervised Learning for Molecular Property Prediction.

作者信息

Liu Yingxu, Fan Qing, Xu Chengcheng, Ning Xiangzhen, Wang Yu, Liu Yang, Xie Yu, Zhang Yanmin, Chen Yadong, Liu Haichun

机构信息

School of Science, China Pharmaceutical University, Nanjing, 210009, China.

出版信息

Mol Inform. 2025 Jan;44(1):e202400146. doi: 10.1002/minf.202400146. Epub 2024 Oct 24.

Abstract

BACKGROUND

Effective molecular feature representation is crucial for drug property prediction. Recent years have seen increased attention on graph neural networks (GNNs) that are pre-trained using self-supervised learning techniques, aiming to overcome the scarcity of labeled data in molecular property prediction. Traditional GNNs in self-supervised molecular property prediction typically perform a single masking operation on the nodes and edges of the input molecular graph, masking only local information and insufficient for thorough self-supervised training.

METHOD

Hence, we propose a model for molecular property prediction based on generative double-masking self-supervised learning, termed as GDMol. This integrates generative learning into the self-supervised learning framework for latent representation, and applies a second round of masking to these latent representations, enabling the model to better capture global information and semantic knowledge of the molecules for a richer, more informative representation, thereby achieving more accurate and robust molecular property prediction.

RESULTS

Our experiments on 5 datasets demonstrated superior performance of GDMol in predicting molecular properties across different domains. Moreover, we used the masking operation to traverse through the gradient changes of each node, the magnitude and sign of which reflect the positive and negative contribution respectively of the local structure in the molecule to the prediction outcome. This in-depth interpretative analysis not only enhances the model's interpretability, but also provides more targeted insights and direction for optimizing drug molecules.

CONCLUSIONS

In summary, this research offers novel insights on improving molecular property prediction tasks, and paves the way for further research on the application of generative learning and self-supervised learning in the field of chemistry.

摘要

背景

有效的分子特征表示对于药物性质预测至关重要。近年来,人们越来越关注使用自监督学习技术进行预训练的图神经网络(GNN),旨在克服分子性质预测中标记数据的稀缺性。自监督分子性质预测中的传统GNN通常对输入分子图的节点和边执行单一的掩码操作,仅掩盖局部信息,不足以进行全面的自监督训练。

方法

因此,我们提出了一种基于生成式双掩码自监督学习的分子性质预测模型,称为GDMol。这将生成学习集成到用于潜在表示的自监督学习框架中,并对这些潜在表示应用第二轮掩码,使模型能够更好地捕获分子的全局信息和语义知识,以获得更丰富、更具信息性的表示,从而实现更准确、更稳健的分子性质预测。

结果

我们在5个数据集上的实验表明,GDMol在预测不同领域的分子性质方面具有卓越的性能。此外,我们使用掩码操作遍历每个节点的梯度变化,其大小和符号分别反映分子中局部结构对预测结果的正贡献和负贡献。这种深入的解释性分析不仅增强了模型的可解释性,还为优化药物分子提供了更有针对性的见解和方向。

结论

总之,本研究为改进分子性质预测任务提供了新的见解,并为生成学习和自监督学习在化学领域的应用的进一步研究铺平了道路。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验