使用多模态大语言模型（MLLMs）进行自动化电合成反应挖掘。

Automated electrosynthesis reaction mining with multimodal large language models (MLLMs).

作者信息

Leong Shi Xuan, Pablo-García Sergio, Zhang Zijian, Aspuru-Guzik Alán

机构信息

Department of Chemistry, University of Toronto, Lash Miller Chemical Laboratories 80 St. George Street ON M5S 3H6 Toronto Canada

Division of Chemistry and Biological Chemistry, School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University 21 Nanyang Link Singapore 637371.

出版信息

Chem Sci. 2024 Oct 9;15(43):17881-91. doi: 10.1039/d4sc04630g.

DOI:10.1039/d4sc04630g

PMID:39397816

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11462585/

Abstract

Leveraging the chemical data available in legacy formats such as publications and patents is a significant challenge for the community. Automated reaction mining offers a promising solution to unleash this knowledge into a learnable digital form and therefore help expedite materials and reaction discovery. However, existing reaction mining toolkits are limited to single input modalities (text or images) and cannot effectively integrate heterogeneous data that is scattered across text, tables, and figures. In this work, we go beyond single input modalities and explore multimodal large language models (MLLMs) for the analysis of diverse data inputs for automated electrosynthesis reaction mining. We compiled a test dataset of 65 articles (MERMES-T24 set) and employed it to benchmark five prominent MLLMs against two critical tasks: (i) reaction diagram parsing and (ii) resolving cross-modality data interdependencies. The frontrunner MLLM achieved ≥96% accuracy in both tasks, with the strategic integration of single-shot visual prompts and image pre-processing techniques. We integrate this capability into a toolkit named MERMES (multimodal reaction mining pipeline for electrosynthesis). Our toolkit functions as an end-to-end MLLM-powered pipeline that integrates article retrieval, information extraction and multimodal analysis for streamlining and automating knowledge extraction. This work lays the groundwork for the increased utilization of MLLMs to accelerate the digitization of chemistry knowledge for data-driven research.

摘要

利用诸如出版物和专利等传统格式中可用的化学数据，对该领域来说是一项重大挑战。自动化反应挖掘提供了一个有前景的解决方案，可将这些知识转化为可学习的数字形式，从而有助于加快材料和反应的发现。然而，现有的反应挖掘工具包仅限于单一输入模式（文本或图像），无法有效整合分散在文本、表格和图表中的异构数据。在这项工作中，我们超越了单一输入模式，探索了多模态大语言模型（MLLMs），用于分析各种数据输入以进行自动化电合成反应挖掘。我们编制了一个包含65篇文章的测试数据集（MERMES-T24集），并将其用于针对两项关键任务对五个著名的MLLMs进行基准测试：（i）反应图解析和（ii）解决跨模态数据的相互依赖性。领先的MLLM在这两项任务中均实现了≥96%的准确率，这得益于单次视觉提示和图像预处理技术的策略性整合。我们将此功能集成到一个名为MERMES（用于电合成的多模态反应挖掘管道）的工具包中。我们的工具包作为一个由MLLM驱动的端到端管道，集成了文章检索、信息提取和多模态分析，以简化和自动化知识提取。这项工作为更多地利用MLLMs奠定了基础，以加速化学知识的数字化，促进数据驱动的研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1047/11539266/df3052274647/d4sc04630g-f1.jpg

相似文献

Automated electrosynthesis reaction mining with multimodal large language models (MLLMs).使用多模态大语言模型（MLLMs）进行自动化电合成反应挖掘。

Chem Sci. 2024 Oct 9;15(43):17881-91. doi: 10.1039/d4sc04630g.

Q-BENCH: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs.Q-BENCH：从单图像到图像对的低级视觉多模态基础模型基准测试

IEEE Trans Pattern Anal Mach Intell. 2024 Aug 21;PP. doi: 10.1109/TPAMI.2024.3445770.

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning.通过可视化参考指令微调推进图表问答中的多模态大语言模型

IEEE Trans Vis Comput Graph. 2025 Jan;31(1):525-535. doi: 10.1109/TVCG.2024.3456159. Epub 2024 Nov 25.

ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature.ChemDataExtractor：一个用于从科学文献中自动提取化学信息的工具包。

J Chem Inf Model. 2016 Oct 24;56(10):1894-1904. doi: 10.1021/acs.jcim.6b00207. Epub 2016 Oct 6.

BatteryDataExtractor: battery-aware text-mining software embedded with BERT models.电池数据提取器：嵌入BERT模型的电池感知文本挖掘软件。

Chem Sci. 2022 Sep 23;13(39):11487-11495. doi: 10.1039/d2sc04322j. eCollection 2022 Oct 12.

OpenChemIE: An Information Extraction Toolkit for Chemistry Literature.OpenChemIE：一个化学文献信息抽取工具包。

J Chem Inf Model. 2024 Jul 22;64(14):5521-5534. doi: 10.1021/acs.jcim.4c00572. Epub 2024 Jul 1.

Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine.从大语言模型到多模态大语言模型：塑造生物大分子以服务生物科学与医学的征程。

Mol Ther Nucleic Acids. 2024 Jun 15;35(3):102255. doi: 10.1016/j.omtn.2024.102255. eCollection 2024 Sep 10.

iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.iTextMine：用于从文献中大规模知识提取的集成文本挖掘系统。

Database (Oxford). 2018 Jan 1;2018:bay128. doi: 10.1093/database/bay128.

Development of an information retrieval tool for biomedical patents.生物医学专利信息检索工具的开发。

Comput Methods Programs Biomed. 2018 Jun;159:125-134. doi: 10.1016/j.cmpb.2018.03.012. Epub 2018 Mar 14.

Fine-tuning large language models for chemical text mining.针对化学文本挖掘对大语言模型进行微调。

Chem Sci. 2024 Jun 7;15(27):10600-10611. doi: 10.1039/d4sc00924j. eCollection 2024 Jul 10.

引用本文的文献

Steering towards safe self-driving laboratories.转向安全的自动驾驶实验室。

Nat Rev Chem. 2025 Aug 18. doi: 10.1038/s41570-025-00747-x.

本文引用的文献

Extracting accurate materials data from research papers with conversational language models and prompt engineering.利用对话式语言模型和提示工程从研究论文中提取准确的材料数据。

Nat Commun. 2024 Feb 21;15(1):1569. doi: 10.1038/s41467-024-45914-8.

Structured information extraction from scientific text with large language models.利用大语言模型从科学文本中提取结构化信息。

Nat Commun. 2024 Feb 15;15(1):1418. doi: 10.1038/s41467-024-45563-x.

Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules.针对机器学习中有机分子的电子和功能特性对GPT-3进行微调。

Chem Sci. 2023 Dec 5;15(2):500-510. doi: 10.1039/d3sc04610a. eCollection 2024 Jan 3.

14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon.大语言模型如何改变材料科学与化学的14个实例：对一场大语言模型黑客马拉松的思考

Digit Discov. 2023 Aug 8;2(5):1233-1250. doi: 10.1039/d3dd00113j. eCollection 2023 Oct 9.

A GPT-4 Reticular Chemist for Guiding MOF Discovery.用于指导金属有机框架材料发现的GPT-4网状化学家。

Angew Chem Int Ed Engl. 2023 Nov 13;62(46):e202311983. doi: 10.1002/anie.202311983. Epub 2023 Oct 13.

ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes.反应数据提取器 2.0：一种从化学反应图中提取数据的深度学习方法。

J Chem Inf Model. 2023 Oct 9;63(19):6053-6067. doi: 10.1021/acs.jcim.3c00422. Epub 2023 Sep 20.

DECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications.DECIMER.ai：一个用于科学出版物中光学化学结构自动识别、分割和识别的开放平台。

Nat Commun. 2023 Aug 19;14(1):5045. doi: 10.1038/s41467-023-40782-0.

ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis.用于文本挖掘和金属有机框架合成预测的ChatGPT化学助手

J Am Chem Soc. 2023 Aug 16;145(32):18048-18062. doi: 10.1021/jacs.3c05819. Epub 2023 Aug 7.

Scientific discovery in the age of artificial intelligence.人工智能时代的科学发现。

Nature. 2023 Aug;620(7972):47-60. doi: 10.1038/s41586-023-06221-2. Epub 2023 Aug 2.

RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing.RxnScribe：一种用于反应图解析的序列生成模型。

J Chem Inf Model. 2023 Jul 10;63(13):4030-4041. doi: 10.1021/acs.jcim.3c00439. Epub 2023 Jun 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用多模态大语言模型（MLLMs）进行自动化电合成反应挖掘。

Automated electrosynthesis reaction mining with multimodal large language models (MLLMs).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献