• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探究多模态语言模型在化学和材料研究中的局限性。

Probing the limitations of multimodal language models for chemistry and materials research.

作者信息

Alampara Nawaf, Schilling-Wilhelmi Mara, Ríos-García Martiño, Mandal Indrajeet, Khetarpal Pranav, Grover Hargun Singh, Krishnan N M Anoop, Jablonka Kevin Maik

机构信息

Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Jena, Germany.

School of Interdisciplinary Research, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.

出版信息

Nat Comput Sci. 2025 Oct;5(10):952-961. doi: 10.1038/s43588-025-00836-3. Epub 2025 Aug 11.

DOI:10.1038/s43588-025-00836-3
PMID:40789967
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12513823/
Abstract

Recent advancements in artificial intelligence have sparked interest in scientific assistants that could support researchers across the full spectrum of scientific workflows, from literature review to experimental design and data analysis. A key capability for such systems is the ability to process and reason about scientific information in both visual and textual forms-from interpreting spectroscopic data to understanding laboratory set-ups. Here we introduce MaCBench, a comprehensive benchmark for evaluating how vision language models handle real-world chemistry and materials science tasks across three core aspects: data extraction, experimental execution and results interpretation. Through a systematic evaluation of leading models, we find that although these systems show promising capabilities in basic perception tasks-achieving near-perfect performance in equipment identification and standardized data extraction-they exhibit fundamental limitations in spatial reasoning, cross-modal information synthesis and multi-step logical inference. Our insights have implications beyond chemistry and materials science, suggesting that developing reliable multimodal AI scientific assistants may require advances in curating suitable training data and approaches to training those models.

摘要

人工智能领域的最新进展引发了人们对科学助手的兴趣,这类助手可以在从文献综述到实验设计和数据分析的全科学工作流程中为研究人员提供支持。此类系统的一项关键能力是能够处理和推理视觉和文本形式的科学信息——从解释光谱数据到理解实验室设置。在此,我们引入MaCBench,这是一个全面的基准,用于评估视觉语言模型如何在数据提取、实验执行和结果解释这三个核心方面处理现实世界中的化学和材料科学任务。通过对领先模型的系统评估,我们发现,尽管这些系统在基本感知任务中展现出了有前景的能力——在设备识别和标准化数据提取方面实现了近乎完美的性能——但它们在空间推理、跨模态信息合成和多步逻辑推理方面存在根本局限性。我们的见解不仅适用于化学和材料科学领域,还表明开发可靠的多模态人工智能科学助手可能需要在策划合适的训练数据以及训练这些模型的方法方面取得进展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/85fb33b48104/43588_2025_836_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/0e959fddcfcf/43588_2025_836_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/407bd4578618/43588_2025_836_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/3b3fe9ba2bc0/43588_2025_836_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/1e9e8d65b195/43588_2025_836_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/85fb33b48104/43588_2025_836_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/0e959fddcfcf/43588_2025_836_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/407bd4578618/43588_2025_836_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/3b3fe9ba2bc0/43588_2025_836_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/1e9e8d65b195/43588_2025_836_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/da35/12513823/85fb33b48104/43588_2025_836_Fig5_HTML.jpg

相似文献

1
Probing the limitations of multimodal language models for chemistry and materials research.探究多模态语言模型在化学和材料研究中的局限性。
Nat Comput Sci. 2025 Oct;5(10):952-961. doi: 10.1038/s43588-025-00836-3. Epub 2025 Aug 11.
2
Vesicoureteral Reflux膀胱输尿管反流
3
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.
4
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
5
Short-Term Memory Impairment短期记忆障碍
6
Mid Forehead Brow Lift额中眉提升术
7
Shoulder Arthrogram肩关节造影
8
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
9
Sexual Harassment and Prevention Training性骚扰与预防培训
10
Fabricating mice and dementia: opening up relations in multi-species research制造小鼠与痴呆症:开启多物种研究中的关联

引用本文的文献

1
Evaluating large language models on multimodal chemistry olympiad exams.在多模态化学奥林匹克竞赛考试中评估大语言模型。
Commun Chem. 2025 Dec 13;8(1):402. doi: 10.1038/s42004-025-01782-x.
2
Evaluating large language model agents for automation of atomic force microscopy.评估用于原子力显微镜自动化的大语言模型智能体。
Nat Commun. 2025 Oct 14;16(1):9104. doi: 10.1038/s41467-025-64105-7.

本文引用的文献

1
The Virtual Lab of AI agents designs new SARS-CoV-2 nanobodies.人工智能代理虚拟实验室设计新型新冠病毒纳米抗体。
Nature. 2025 Jul 29. doi: 10.1038/s41586-025-09442-9.
2
A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists.一个根据化学家的专业知识来评估大语言模型化学知识和推理能力的框架。
Nat Chem. 2025 May 20. doi: 10.1038/s41557-025-01815-x.
3
A review of large language models and autonomous agents in chemistry.化学领域中大型语言模型与自主智能体的综述。
Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5.
4
From text to insight: large language models for chemical data extraction.从文本到洞察:用于化学数据提取的大语言模型
Chem Soc Rev. 2025 Feb 3;54(3):1125-1150. doi: 10.1039/d4cs00913d.
5
Embers of autoregression show how large language models are shaped by the problem they are trained to solve.自回归的余烬表明,大型语言模型是如何被它们被训练来解决的问题所塑造的。
Proc Natl Acad Sci U S A. 2024 Oct 8;121(41):e2322420121. doi: 10.1073/pnas.2322420121. Epub 2024 Oct 4.
6
Extracting structured data from organic synthesis procedures using a fine-tuned large language model.使用微调的大语言模型从有机合成程序中提取结构化数据。
Digit Discov. 2024 Jul 31;3(9):1822-1831. doi: 10.1039/d4dd00091a. eCollection 2024 Sep 11.
7
Self-Driving Laboratories for Chemistry and Materials Science.化学与材料科学的自动驾驶实验室
Chem Rev. 2024 Aug 28;124(16):9633-9732. doi: 10.1021/acs.chemrev.4c00055. Epub 2024 Aug 13.
8
Augmenting large language models with chemistry tools.用化学工具增强大语言模型。
Nat Mach Intell. 2024;6(5):525-535. doi: 10.1038/s42256-024-00832-8. Epub 2024 May 8.
9
Extracting accurate materials data from research papers with conversational language models and prompt engineering.利用对话式语言模型和提示工程从研究论文中提取准确的材料数据。
Nat Commun. 2024 Feb 21;15(1):1569. doi: 10.1038/s41467-024-45914-8.
10
Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.结构化提示查询和语义递归提取(SPIRES):一种使用零样本学习填充知识库的方法。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae104.