Department of Computer Science, Korea University, Seoul 02841, Republic of Korea.
AIGEN Sciences, Seoul 04778, Republic of Korea.
Bioinformatics. 2024 Jun 28;40(Suppl 1):i369-i380. doi: 10.1093/bioinformatics/btae256.
Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts in molecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios.
Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates.
The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA.
分子核心结构和 R 基团是药物开发中的重要概念。将这些概念与传统的图预训练方法相结合,可以促进对分子的更深入理解。我们提出了 MolPLA,这是一种新颖的预训练框架,它采用掩蔽图对比学习来理解分子中隐含其核心结构和外围 R 基团的可分解部分。此外,我们还提出了一个额外的框架,使 MolPLA 能够帮助化学家在先导优化场景中找到可替换的 R 基团。
分子性质预测的实验结果表明,MolPLA 表现出与当前最先进模型相当的可预测性。定性分析表明,MolPLA 能够区分核心和 R 基团子结构,识别分子中的可分解区域,并通过合理建议各种查询核心模板的 R 基团替换来为先导优化场景做出贡献。
MolPLA 的代码实现及其预训练模型检查点可在 https://github.com/dmis-lab/MolPLA 上获得。