Leong Shi Xuan, Pablo-García Sergio, Zhang Zijian, Aspuru-Guzik Alán
Department of Chemistry, University of Toronto, Lash Miller Chemical Laboratories 80 St. George Street ON M5S 3H6 Toronto Canada
Division of Chemistry and Biological Chemistry, School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University 21 Nanyang Link Singapore 637371.
Chem Sci. 2024 Oct 9;15(43):17881-91. doi: 10.1039/d4sc04630g.
Leveraging the chemical data available in legacy formats such as publications and patents is a significant challenge for the community. Automated reaction mining offers a promising solution to unleash this knowledge into a learnable digital form and therefore help expedite materials and reaction discovery. However, existing reaction mining toolkits are limited to single input modalities (text or images) and cannot effectively integrate heterogeneous data that is scattered across text, tables, and figures. In this work, we go beyond single input modalities and explore multimodal large language models (MLLMs) for the analysis of diverse data inputs for automated electrosynthesis reaction mining. We compiled a test dataset of 65 articles (MERMES-T24 set) and employed it to benchmark five prominent MLLMs against two critical tasks: (i) reaction diagram parsing and (ii) resolving cross-modality data interdependencies. The frontrunner MLLM achieved ≥96% accuracy in both tasks, with the strategic integration of single-shot visual prompts and image pre-processing techniques. We integrate this capability into a toolkit named MERMES (multimodal reaction mining pipeline for electrosynthesis). Our toolkit functions as an end-to-end MLLM-powered pipeline that integrates article retrieval, information extraction and multimodal analysis for streamlining and automating knowledge extraction. This work lays the groundwork for the increased utilization of MLLMs to accelerate the digitization of chemistry knowledge for data-driven research.
利用诸如出版物和专利等传统格式中可用的化学数据,对该领域来说是一项重大挑战。自动化反应挖掘提供了一个有前景的解决方案,可将这些知识转化为可学习的数字形式,从而有助于加快材料和反应的发现。然而,现有的反应挖掘工具包仅限于单一输入模式(文本或图像),无法有效整合分散在文本、表格和图表中的异构数据。在这项工作中,我们超越了单一输入模式,探索了多模态大语言模型(MLLMs),用于分析各种数据输入以进行自动化电合成反应挖掘。我们编制了一个包含65篇文章的测试数据集(MERMES-T24集),并将其用于针对两项关键任务对五个著名的MLLMs进行基准测试:(i)反应图解析和(ii)解决跨模态数据的相互依赖性。领先的MLLM在这两项任务中均实现了≥96%的准确率,这得益于单次视觉提示和图像预处理技术的策略性整合。我们将此功能集成到一个名为MERMES(用于电合成的多模态反应挖掘管道)的工具包中。我们的工具包作为一个由MLLM驱动的端到端管道,集成了文章检索、信息提取和多模态分析,以简化和自动化知识提取。这项工作为更多地利用MLLMs奠定了基础,以加速化学知识的数字化,促进数据驱动的研究。