Yue Jie, Peng Bingxin, Chen Yu, Jin Jieyu, Zhao Xinda, Shen Chao, Ji Xiangyang, Hsieh Chang-Yu, Song Jianfei, Hou Tingjun, Deng Yafeng, Wang Jike
College of Information Engineering, Hebei University of Architecture Zhangjiakou 075132 Hebei China.
CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China
Chem Sci. 2024 Jul 29;15(34):13727-13740. doi: 10.1039/d4sc03744h. eCollection 2024 Aug 28.
Molecular generation stands at the forefront of AI-driven technologies, playing a crucial role in accelerating the development of small molecule drugs. The intricate nature of practical drug discovery necessitates the development of a versatile molecular generation framework that can tackle diverse drug design challenges. However, existing methodologies often struggle to encompass all aspects of small molecule drug design, particularly those rooted in language models, especially in tasks like linker design, due to the autoregressive nature of large language model-based approaches. To empower a language model for a wider range of molecular design tasks, we introduce an unordered simplified molecular-input line-entry system based on fragments (FU-SMILES). Building upon this foundation, we propose FragGPT, a universal fragment-based molecular generation model. Initially pretrained on extensive molecular datasets, FragGPT utilizes FU-SMILES to facilitate efficient generation across various practical applications, such as molecule design, linker design, R-group exploration, scaffold hopping, and side chain optimization. Furthermore, we integrate conditional generation and reinforcement learning (RL) methodologies to ensure that the generated molecules possess multiple desired biological and physicochemical properties. Experimental results across diverse scenarios validate FragGPT's superiority in generating molecules with enhanced properties and novel structures, outperforming existing state-of-the-art models. Moreover, its robust drug design capability is further corroborated through real-world drug design cases.
分子生成处于人工智能驱动技术的前沿,在加速小分子药物的开发中发挥着关键作用。实际药物发现的复杂性要求开发一个通用的分子生成框架,以应对各种药物设计挑战。然而,由于基于大语言模型的方法具有自回归性质,现有方法往往难以涵盖小分子药物设计的所有方面,尤其是那些基于语言模型的方面,在诸如连接子设计等任务中尤为明显。为了使语言模型能够胜任更广泛的分子设计任务,我们引入了一种基于片段的无序简化分子输入线性条目系统(FU-SMILES)。在此基础上,我们提出了FragGPT,一种通用的基于片段的分子生成模型。FragGPT最初在大量分子数据集上进行预训练,利用FU-SMILES促进在各种实际应用中的高效生成,如分子设计、连接子设计、R基团探索、骨架跳跃和侧链优化。此外,我们整合了条件生成和强化学习(RL)方法,以确保生成的分子具有多种所需的生物学和物理化学性质。不同场景下的实验结果验证了FragGPT在生成具有增强性质和新颖结构的分子方面的优越性,优于现有的最先进模型。此外,其强大的药物设计能力在实际药物设计案例中得到了进一步证实。