文献检索，用中文搜 PubMed

The integration of large language models (LLMs) into drug design is gaining momentum; however, existing approaches often struggle to effectively incorporate three-dimensional molecular structures. Here, we present Token-Mol, a token-only 3D drug design model that encodes both 2D and 3D structural information, along with molecular properties, into discrete tokens. Built on a transformer decoder and trained with causal masking, Token-Mol introduces a Gaussian cross-entropy loss function tailored for regression tasks, enabling superior performance across multiple downstream applications. The model surpasses existing methods, improving molecular conformation generation by over 10% and 20% across two datasets, while outperforming token-only models by 30% in property prediction. In pocket-based molecular generation, it enhances drug-likeness and synthetic accessibility by approximately 11% and 14%, respectively. Notably, Token-Mol operates 35 times faster than expert diffusion models. In real-world validation, it improves success rates and, when combined with reinforcement learning, further optimizes affinity and drug-likeness, advancing AI-driven drug discovery.

将大语言模型（LLMs）整合到药物设计中正在获得发展势头；然而，现有方法往往难以有效地纳入三维分子结构。在此，我们提出了Token-Mol，这是一种仅基于标记的三维药物设计模型，它将二维和三维结构信息以及分子特性编码为离散标记。基于变压器解码器构建并通过因果掩码进行训练，Token-Mol引入了专为回归任务定制的高斯交叉熵损失函数，在多个下游应用中实现了卓越性能。该模型超越了现有方法，在两个数据集上分子构象生成提高了超过10%和20%，同时在属性预测方面比仅基于标记的模型性能高出30%。在基于口袋的分子生成中，它分别将类药性和合成可及性提高了约11%和14%。值得注意的是，Token-Mol的运行速度比专家扩散模型快35倍。在实际验证中，它提高了成功率，并且与强化学习相结合时，进一步优化了亲和力和类药性，推动了人工智能驱动的药物发现。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Token-Mol 1.0：基于大语言模型的标记化药物设计

Token-Mol 1.0: tokenized drug design with large language models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献