Suppr超能文献

整合机器学习与大语言模型以推动电化学反应探索

Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions.

作者信息

Zheng Zhiling, Florit Federico, Jin Brooke, Wu Haoyang, Li Shih-Cheng, Nandiwale Kakasaheb Y, Salazar Chase A, Mustakis Jason G, Green William H, Jensen Klavs F

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02142, United States.

Chemical Research & Development, Pfizer Worldwide Research and Development, Groton, CT 06340, United States.

出版信息

Angew Chem Int Ed Engl. 2025 Feb 3;64(6):e202418074. doi: 10.1002/anie.202418074. Epub 2024 Dec 18.

Abstract

Electrochemical C-H oxidation reactions offer a sustainable route to functionalize hydrocarbons, yet identifying suitable substrates and optimizing synthesis remain challenging. Here, we report an integrated approach combining machine learning and large language models to streamline the exploration of electrochemical C-H oxidation reactions. Utilizing a batch rapid screening electrochemical platform, we evaluated a wide range of reactions, initially classifying substrates by their reactivity, while LLMs text-mined literature data to augment the training set. The resulting ML models for reactivity prediction achieved high accuracy (>90 %) and enabled virtual screening of a large set of commercially available molecules. To optimize reaction conditions for selected substrates, LLMs were prompted to generate code that iteratively improved yields. This human-AI collaboration proved effective, efficiently identifying high-yield conditions for 8 drug-like substances or intermediates. Notably, we benchmarked the accuracy and reliability of 12 different LLMs-including LLaMA series, Claude series, OpenAI o1, and GPT-4-on code generation and function calling related to ML based on natural language prompts given by chemists to showcase potentials for accelerating research across four diverse tasks. In addition, we collected an experimental benchmark dataset comprising 1071 reaction conditions and yields for electrochemical C-H oxidation reactions.

摘要

电化学C-H氧化反应为烃类官能团化提供了一条可持续的途径,但确定合适的底物并优化合成仍然具有挑战性。在此,我们报告了一种结合机器学习和大语言模型的综合方法,以简化电化学C-H氧化反应的探索。利用批量快速筛选电化学平台,我们评估了广泛的反应,最初根据底物的反应性对其进行分类,同时大语言模型对文献数据进行文本挖掘以扩充训练集。由此产生的用于反应性预测的机器学习模型实现了高精度(>90%),并能够对大量市售分子进行虚拟筛选。为了优化选定底物的反应条件,我们促使大语言模型生成能够迭代提高产率的代码。这种人机协作被证明是有效的,能够高效地为8种类药物物质或中间体确定高产率条件。值得注意的是,我们基于化学家给出的自然语言提示,对12种不同的大语言模型(包括LLaMA系列、Claude系列、OpenAI o1和GPT-4)在与机器学习相关的代码生成和函数调用方面的准确性和可靠性进行了基准测试,以展示其在加速四项不同任务研究方面的潜力。此外,我们收集了一个包含1071个电化学C-H氧化反应条件和产率的实验基准数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1655/11795713/aa7b3708a090/ANIE-64-e202418074-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验