Ghafarollahi Alireza, Buehler Markus J
Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA
Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology 77 Massachusetts Ave. Cambridge MA 02139 USA.
Digit Discov. 2024 May 17;3(7):1389-1409. doi: 10.1039/d4dd00013g. eCollection 2024 Jul 10.
Designing proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or . However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data - natural vibrational frequencies - physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
设计自然界中不存在的蛋白质在科学和工程应用的进步方面具有重大前景。当前的蛋白质设计方法通常依赖基于人工智能的模型,例如通过将蛋白质结构与材料特性联系起来解决端到端问题的替代模型。然而,这些模型经常专注于特定的材料目标或结构特性,在将域外知识纳入设计过程或需要进行全面数据分析时,限制了它们的灵活性。在本研究中,我们引入了ProtAgents,这是一个基于大语言模型(LLMs)的蛋白质设计平台,其中多个具有不同能力的人工智能代理在动态环境中协作解决复杂任务。代理开发的多功能性允许在包括知识检索、蛋白质结构分析、基于物理的模拟和结果分析等不同领域拥有专业知识。由大语言模型赋能的代理之间的动态协作提供了一种通用方法来解决蛋白质设计和分析问题,正如本研究中的各种示例所示。感兴趣的问题包括设计新蛋白质、分析蛋白质结构以及获得新的第一性原理数据——自然振动频率——进行物理模拟。系统的协同努力允许对具有目标机械性能的蛋白质进行强大的自动化和协同设计。一方面,设计代理的灵活性,另一方面,它们通过基于大语言模型的动态多代理环境进行自主协作的能力,释放了大语言模型在解决多目标材料问题方面的巨大潜力,并为自主材料发现和设计开辟了新途径。