Suppr超能文献

仙人掌:将化学试剂连接工具的使用与科学相结合。

CACTUS: Chemistry Agent Connecting Tool Usage to Science.

作者信息

McNaughton Andrew D, Sankar Ramalaxmi Gautham Krishna, Kruel Agustin, Knutson Carter R, Varikoti Rohith A, Kumar Neeraj

机构信息

Pacific Northwest National Laboratory, Richland, Washington 99354, United States.

出版信息

ACS Omega. 2024 Oct 25;9(46):46563-46573. doi: 10.1021/acsomega.4c08408. eCollection 2024 Nov 19.

Abstract

Large language models (LLMs) have shown remarkable potential in various domains but often lack the ability to access and reason over domain-specific knowledge and tools. In this article, we introduce Chemistry Agent Connecting Tool-Usage to Science (CACTUS), an LLM-based agent that integrates existing cheminformatics tools to enable accurate and advanced reasoning and problem-solving in chemistry and molecular discovery. We evaluate the performance of CACTUS using a diverse set of open-source LLMs, including Gemma-7b, Falcon-7b, MPT-7b, Llama3-8b, and Mistral-7b, on a benchmark of thousands of chemistry questions. Our results demonstrate that CACTUS significantly outperforms baseline LLMs, with the Gemma-7b, Mistral-7b, and Llama3-8b models achieving the highest accuracy regardless of the prompting strategy used. Moreover, we explore the impact of domain-specific prompting and hardware configurations on model performance, highlighting the importance of prompt engineering and the potential for deploying smaller models on consumer-grade hardware without a significant loss in accuracy. By combining the cognitive capabilities of open-source LLMs with widely used domain-specific tools provided by RDKit, CACTUS can assist researchers in tasks such as molecular property prediction, similarity searching, and drug-likeness assessment.

摘要

大语言模型(LLMs)在各个领域都展现出了非凡的潜力,但往往缺乏获取特定领域知识和工具并进行推理的能力。在本文中,我们介绍了化学智能体连接工具使用与科学(CACTUS),这是一种基于大语言模型的智能体,它集成了现有的化学信息学工具,以在化学和分子发现中实现准确和高级的推理及问题解决。我们使用多种开源大语言模型,包括Gemma - 7b、Falcon - 7b、MPT - 7b、Llama3 - 8b和Mistral - 7b,在数千个化学问题的基准测试中评估了CACTUS的性能。我们的结果表明,CACTUS显著优于基线大语言模型,无论使用何种提示策略,Gemma - 7b、Mistral - 7b和Llama3 - 8b模型都达到了最高的准确率。此外,我们探讨了特定领域提示和硬件配置对模型性能的影响,强调了提示工程的重要性以及在消费级硬件上部署较小模型而不显著损失准确性的潜力。通过将开源大语言模型的认知能力与RDKit提供的广泛使用的特定领域工具相结合,CACTUS可以协助研究人员完成诸如分子性质预测、相似性搜索和类药性评估等任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0db/11579734/bff8b60c6d6d/ao4c08408_0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验