Suppr超能文献

ChatMol:用自然语言进行交互式分子发现。

ChatMol: interactive molecular discovery with natural language.

机构信息

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

PingAn Technology, Beijing 100027, China.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae534.

Abstract

MOTIVATION

Natural language is poised to become a key medium for human-machine interactions in the era of large language models. In the field of biochemistry, tasks such as property prediction and molecule mining are critically important yet technically challenging. Bridging molecular expressions in natural language and chemical language can significantly enhance the interpretability and ease of these tasks. Moreover, it can integrate chemical knowledge from various sources, leading to a deeper understanding of molecules.

RESULTS

Recognizing these advantages, we introduce the concept of conversational molecular design, a novel task that utilizes natural language to describe and edit target molecules. To better accomplish this task, we develop ChatMol, a knowledgeable and versatile generative pretrained model. This model is enhanced by incorporating experimental property information, molecular spatial knowledge, and the associations between natural and chemical languages. Several typical solutions including large language models (e.g. ChatGPT) are evaluated, proving the challenge of conversational molecular design and the effectiveness of our knowledge enhancement approach. Case observations and analysis offer insights and directions for further exploration of natural-language interaction in molecular discovery.

AVAILABILITY AND IMPLEMENTATION

Codes and data are provided in https://github.com/Ellenzzn/ChatMol/tree/main.

摘要

动机

在大型语言模型时代,自然语言有望成为人机交互的主要媒介。在生物化学领域,属性预测和分子挖掘等任务至关重要,但技术上具有挑战性。弥合自然语言和化学语言中的分子表达,可以显著提高这些任务的可解释性和易用性。此外,它还可以整合来自各种来源的化学知识,从而更深入地了解分子。

结果

认识到这些优势,我们引入了对话式分子设计的概念,这是一项利用自然语言描述和编辑目标分子的新任务。为了更好地完成这项任务,我们开发了 ChatMol,这是一个知识渊博且多功能的生成式预训练模型。通过整合实验性质信息、分子空间知识以及自然语言和化学语言之间的联系,对该模型进行了增强。我们评估了包括大型语言模型(例如 ChatGPT)在内的几个典型解决方案,证明了对话式分子设计的挑战性和我们的知识增强方法的有效性。案例观察和分析为进一步探索分子发现中的自然语言交互提供了思路和方向。

可用性和实现

代码和数据可在 https://github.com/Ellenzzn/ChatMol/tree/main 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5fe8/11520398/5672db17444a/btae534f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验