Fu Fei, Li Qing-Qing, Wang Fangrong, Hu Jie, Wang Tian-Tian, Liu Yun-Pei, Xu Weihong, Lin Zhili, Gong Fu-Qiang, Fan Qi-Yuan, Pan Jeff Z, Wang Ye, Cheng Jun
State Key Laboratory of Physical Chemistry of Solid Surface, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 361005, China.
School of Informatics, The University of Edinburgh, Edinburgh EH8 9AB, UK.
Natl Sci Rev. 2025 Jul 14;12(8):nwaf271. doi: 10.1093/nsr/nwaf271. eCollection 2025 Aug.
Relay catalysis integrates multiple catalytic reactions to efficiently transform intermediates and enhance conversion and selectivity. However, designing these pathways and multifunctional catalysts is often lengthy and costly, heavily relying on in-depth literature analysis by experienced researchers. To address this, we developed an approach that combines a knowledge graph (KG) and large language models (LLMs) to automatically recommend multistep catalytic reaction pathways. Our method involves using an LLM-assisted workflow for data acquisition and organization, followed by the construction of a detailed catalysis knowledge graph (Cat-KG). After querying the Cat-KG, promising relay catalysis pathways are identified by applying scoring rules informed by expertise in relay catalysis. The LLM then transforms the structured pathways and reaction condition data into readable chemical equations and descriptions for chemists. This step integrates catalysis knowledge from the Cat-KG and helps avoid LLM-induced hallucinations by using reliable information. The method efficiently recommended relay catalysis pathways for ethylene, ethanol, 2,5-furandicarboxylate and other targets within minutes, identifying pathways consistent with reported ones while using different reaction conditions, validating its effectiveness. Thus, this strategy can extrapolate known and novel relay catalysis pathways, showcasing its potential for application in pathway selection.
接力催化整合多个催化反应,以有效转化中间体并提高转化率和选择性。然而,设计这些途径和多功能催化剂通常耗时且成本高昂,严重依赖经验丰富的研究人员进行深入的文献分析。为了解决这个问题,我们开发了一种结合知识图谱(KG)和大语言模型(LLM)的方法,以自动推荐多步催化反应途径。我们的方法包括使用LLM辅助工作流程进行数据采集和整理,随后构建详细的催化知识图谱(Cat-KG)。查询Cat-KG后,通过应用基于接力催化专业知识的评分规则来识别有前景的接力催化途径。然后,LLM将结构化的途径和反应条件数据转化为化学家可读的化学方程式和描述。这一步整合了来自Cat-KG的催化知识,并通过使用可靠信息帮助避免LLM产生的幻觉。该方法在几分钟内就有效地为乙烯、乙醇、2,5-呋喃二甲酸酯等目标推荐了接力催化途径,识别出与已报道途径一致但使用不同反应条件的途径,验证了其有效性。因此,这种策略可以推断已知和新颖的接力催化途径,展示了其在途径选择中的应用潜力。