Xu Xiaopeng, Zhou Juexiao, Zhu Chen, Zhan Qing, Li Zhongxiao, Zhang Ruochi, Wang Yu, Liao Xingyu, Gao Xin
Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
F1000Res. 2024 Feb 20;12:757. doi: 10.12688/f1000research.130936.2. eCollection 2023.
The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models.
In this study, we developed SGPT-RL, which uses a generative pre-trained transformer (GPT) as the policy network of the reinforcement learning (RL) agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison.
The results on the Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration.
The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates.
药物研发中的关键挑战是发现具有理想特性的新型化合物。在这些特性中,与靶点的结合亲和力是先决条件之一,通常通过分子对接或定量构效关系(QSAR)模型进行评估。
在本研究中,我们开发了SGPT-RL,它使用生成式预训练变换器(GPT)作为强化学习(RL)智能体的策略网络,以优化与靶点的结合亲和力。SGPT-RL在Moses分布学习基准以及两个目标导向生成任务上进行了评估,以多巴胺受体D2(DRD2)和血管紧张素转换酶2(ACE2)作为靶点。QSAR模型和分子对接均被用作任务中的优化目标。采用流行的Reinvent方法作为比较基线。
在Moses基准上的结果表明,SGPT-RL学习到了良好的特性分布,并生成了具有高有效性和新颖性的分子。在两个目标导向生成任务中,SGPT-RL和Reinvent都能够生成具有更高靶点得分的有效分子。在以分子对接作为优化目标的ACE2任务中,SGPT-RL方法比Reinvent取得了更好的结果。进一步分析表明,SGPT-RL在探索过程中学习到了保守的骨架模式。
SGPT-RL在ACE2任务中的卓越性能表明,它可应用于广泛使用分子对接作为标准的虚拟筛选过程。此外,SGPT-RL在探索过程中学习到的骨架模式可以帮助化学家更好地设计和发现新型先导候选物。