Yang Xixi, Duan Yanjing, Cheng Zhixiang, Li Kun, Liu Yuansheng, Zeng Xiangxiang, Cao Dongsheng
College of Computer Science and Electronic Engineering, Hunan University, Changsha 410086, Hunan, China.
Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, China.
J Med Chem. 2024 Dec 12;67(23):21303-21316. doi: 10.1021/acs.jmedchem.4c02193. Epub 2024 Dec 2.
Molecular property prediction with deep learning often employs self-supervised learning techniques to learn common knowledge through masked atom prediction. However, the common knowledge gained by masked atom prediction dramatically differs from the graph-level optimization objective of downstream tasks, which results in suboptimal problems. Particularly for properties with limited data, the failure to consider domain knowledge results in a direct search in an immense common space, rendering it infeasible to identify the global optimum. To address this, we propose MPCD, which enhances pretraining transferability by aligning the optimization objectives between pretraining and fine-tuning with domain knowledge. MPCD also leverages multitask learning to improve data utilization and model robustness. Technically, MPCD employs a relation-aware self-attention mechanism to capture molecules' local and global structures comprehensively. Extensive validation demonstrates that MPCD outperforms state-of-the-art methods for absorption, distribution, metabolism, excretion, and toxicity (ADMET) and physicochemical prediction across various data sizes.
利用深度学习进行分子性质预测通常采用自监督学习技术,通过掩码原子预测来学习通用知识。然而,通过掩码原子预测获得的通用知识与下游任务的图级优化目标有很大差异,这导致了次优问题。特别是对于数据有限的性质,由于未能考虑领域知识,导致在巨大的通用空间中进行直接搜索,从而难以确定全局最优解。为了解决这个问题,我们提出了MPCD,它通过将预训练和微调之间的优化目标与领域知识对齐来提高预训练的可迁移性。MPCD还利用多任务学习来提高数据利用率和模型鲁棒性。从技术上讲,MPCD采用了关系感知自注意力机制,以全面捕捉分子的局部和全局结构。广泛的验证表明,MPCD在各种数据规模下的吸收、分布、代谢、排泄和毒性(ADMET)以及物理化学预测方面优于现有方法。