• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估用于原子力显微镜自动化的大语言模型智能体。

Evaluating large language model agents for automation of atomic force microscopy.

作者信息

Mandal Indrajeet, Soni Jitendra, Zaki Mohd, Smedskjaer Morten M, Wondraczek Katrin, Wondraczek Lothar, Gosvami Nitya Nand, Krishnan N M Anoop

机构信息

School of Interdisciplinary Research, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.

Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.

出版信息

Nat Commun. 2025 Oct 14;16(1):9104. doi: 10.1038/s41467-025-64105-7.

DOI:10.1038/s41467-025-64105-7
PMID:41087366
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12521570/
Abstract

Large language models (LLMs) are transforming laboratory automation by enabling self-driving laboratories (SDLs) that could accelerate materials research. However, current SDL implementations rely on rigid protocols that fail to capture the adaptability and intuition of expert scientists in dynamic experimental settings. Here, we show that LLM agents can automate atomic force microscopy (AFM) through our Artificially Intelligent Lab Assistant (AILA) framework. Further, we develop AFMBench-a comprehensive evaluation suite challenging LLM agents across the complete scientific workflow from experimental design to results analysis. We find that state-of-the-art LLMs struggle with basic tasks and coordination scenarios. Notably, models excelling at materials science question-answering perform poorly in laboratory settings, showing that domain knowledge does not translate to experimental capabilities. Additionally, we observe that LLM agents can deviate from instructions, a phenomenon referred to as sleepwalking, raising safety alignment concerns for SDL applications. Our ablations reveal that multi-agent frameworks significantly outperform single-agent approaches, though both remain sensitive to minor changes in instruction formatting or prompting. Finally, we evaluate AILA's effectiveness in increasingly advanced experiments-AFM calibration, feature detection, mechanical property measurement, graphene layer counting, and indenter detection. These findings establish the necessity for benchmarking and robust safety protocols before deploying LLM agents as autonomous laboratory assistants across scientific disciplines.

摘要

大型语言模型(LLMs)正在通过实现能够加速材料研究的自动驾驶实验室(SDLs)来改变实验室自动化。然而,当前的SDL实现依赖于严格的协议,这些协议无法在动态实验环境中捕捉专家科学家的适应性和直觉。在这里,我们展示了LLM智能体可以通过我们的人工智能实验室助手(AILA)框架实现原子力显微镜(AFM)自动化。此外,我们开发了AFMBench——一个全面的评估套件,在从实验设计到结果分析的完整科学工作流程中挑战LLM智能体。我们发现,最先进的LLMs在基本任务和协调场景方面存在困难。值得注意的是,在材料科学问答方面表现出色的模型在实验室环境中表现不佳,这表明领域知识并不能转化为实验能力。此外,我们观察到LLM智能体可能会偏离指令,这种现象被称为“梦游”,这引发了对SDL应用中安全一致性的担忧。我们的消融实验表明,多智能体框架明显优于单智能体方法,不过两者对指令格式或提示中的微小变化仍然敏感。最后,我们评估了AILA在越来越先进的实验——AFM校准、特征检测、力学性能测量、石墨烯层数计数和压头检测中的有效性。这些发现确立了在将LLM智能体作为跨学科的自主实验室助手部署之前进行基准测试和制定稳健安全协议的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/1a38483b7f8e/41467_2025_64105_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/37bf9e9d3570/41467_2025_64105_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/b0bd4caa5e87/41467_2025_64105_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/db1022d2de41/41467_2025_64105_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/1d760baad21a/41467_2025_64105_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/eb19776d5e40/41467_2025_64105_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/1a38483b7f8e/41467_2025_64105_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/37bf9e9d3570/41467_2025_64105_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/b0bd4caa5e87/41467_2025_64105_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/db1022d2de41/41467_2025_64105_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/1d760baad21a/41467_2025_64105_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/eb19776d5e40/41467_2025_64105_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d2e/12521570/1a38483b7f8e/41467_2025_64105_Fig6_HTML.jpg

相似文献

1
Evaluating large language model agents for automation of atomic force microscopy.评估用于原子力显微镜自动化的大语言模型智能体。
Nat Commun. 2025 Oct 14;16(1):9104. doi: 10.1038/s41467-025-64105-7.
2
Vesicoureteral Reflux膀胱输尿管反流
3
Shoulder Arthrogram肩关节造影
4
Mid Forehead Brow Lift额中眉提升术
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Comparative Evaluation of a Medical Large Language Model in Answering Real-World Radiation Oncology Questions: Multicenter Observational Study.
J Med Internet Res. 2025 Sep 23;27:e69752. doi: 10.2196/69752.
7
Post-pandemic planning for maternity care for local, regional, and national maternity systems across the four nations: a mixed-methods study.针对四个地区的地方、区域和国家孕产妇保健系统的疫情后规划:一项混合方法研究。
Health Soc Care Deliv Res. 2025 Sep;13(35):1-25. doi: 10.3310/HHTE6611.
8
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
9
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
10
Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction.用于人格特质预测的大语言模型嵌入的心理测量评估
J Med Internet Res. 2025 Jul 8;27:e75347. doi: 10.2196/75347.

本文引用的文献

1
Probing the limitations of multimodal language models for chemistry and materials research.探究多模态语言模型在化学和材料研究中的局限性。
Nat Comput Sci. 2025 Oct;5(10):952-961. doi: 10.1038/s43588-025-00836-3. Epub 2025 Aug 11.
2
Are large language models superhuman chemists?大型语言模型是超人化学家吗?
Nat Chem. 2025 Jul;17(7):984-985. doi: 10.1038/s41557-025-01865-1.
3
Machine Learning-Based Reward-Driven Tuning of Scanning Probe Microscopy: Toward Fully Automated Microscopy.基于机器学习的扫描探针显微镜奖励驱动调谐:迈向全自动显微镜技术
ACS Nano. 2025 Jun 3;19(21):19659-19669. doi: 10.1021/acsnano.4c18760. Epub 2025 May 19.
4
Autonomous mobile robots for exploratory synthetic chemistry.自主移动机器人在探索性合成化学中的应用。
Nature. 2024 Nov;635(8040):890-897. doi: 10.1038/s41586-024-08173-7. Epub 2024 Nov 6.
5
Self-Driving Laboratories for Chemistry and Materials Science.化学与材料科学的自动驾驶实验室
Chem Rev. 2024 Aug 28;124(16):9633-9732. doi: 10.1021/acs.chemrev.4c00055. Epub 2024 Aug 13.
6
AEcroscopy: A Software-Hardware Framework Empowering Microscopy Toward Automated and Autonomous Experimentation.AEcroscopy:一个软硬件框架,助力显微镜实现自动化和自主实验。
Small Methods. 2024 Oct;8(10):e2301740. doi: 10.1002/smtd.202301740. Epub 2024 Apr 19.
7
Performance metrics to unleash the power of self-driving labs in chemistry and materials science.释放化学与材料科学中自动驾驶实验室力量的性能指标。
Nat Commun. 2024 Feb 14;15(1):1378. doi: 10.1038/s41467-024-45569-5.
8
Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back.自主的、多属性驱动的分子发现:从预测到测量再回归。
Science. 2023 Dec 22;382(6677):eadi1407. doi: 10.1126/science.adi1407.
9
Autonomous chemical research with large language models.大语言模型驱动的自主化学研究。
Nature. 2023 Dec;624(7992):570-578. doi: 10.1038/s41586-023-06792-0. Epub 2023 Dec 20.
10
Machine learning-enabled autonomous operation for atomic force microscopes.基于机器学习的原子力显微镜自主操作。
Rev Sci Instrum. 2023 Dec 1;94(12). doi: 10.1063/5.0172682.