DrugAssist：用于分子优化的大型语言模型。

DrugAssist: a large language model for molecule optimization.

作者信息

Ye Geyan, Cai Xibao, Lai Houtim, Wang Xing, Huang Junhong, Wang Longyue, Liu Wei, Zeng Xiangxiang

机构信息

Tencent AI Lab, Tencent, Shenzhen 518057, China.

Department of Computer Science, Hunan University, Changsha 410008, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae693.

DOI:10.1093/bib/bbae693

PMID:39751647

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11697106/

Abstract

Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called 'MolOpt-Instructions' for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at https://github.com/blazerye/DrugAssist, which we hope to pave the way for future research in LLMs' application for drug discovery.

摘要

最近，大语言模型（LLMs）在广泛任务上令人印象深刻的表现吸引了越来越多将其应用于药物发现的尝试。然而，分子优化作为药物发现流程中的一项关键任务，目前却是大语言模型涉足较少的领域。现有的大多数方法仅专注于捕捉数据中提供的化学结构的潜在模式，而没有利用专家反馈。这些非交互式方法忽略了药物发现过程实际上是一个需要整合专家经验和迭代优化的过程。为了弥补这一差距，我们提出了DrugAssist，这是一种交互式分子优化模型，它通过利用大语言模型强大的交互性和通用性，通过人机对话来进行优化。DrugAssist在单属性和多属性优化方面均取得了领先成果，同时在可迁移性和迭代优化方面展现出巨大潜力。此外，我们公开发布了一个名为“MolOpt-Instructions”的基于指令的大型数据集，用于在分子优化任务上微调语言模型。我们已将代码和数据在https://github.com/blazerye/DrugAssist上公开，希望为大语言模型应用于药物发现的未来研究铺平道路。