• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对从文本挖掘的文献配方中机器学习材料合成见解的尝试的批判性反思。

A critical reflection on attempts to machine-learn materials synthesis insights from text-mined literature recipes.

作者信息

Sun Wenhao, David Nicholas

机构信息

Department of Materials Science and Engineering, University of Michigan, Ann Arbor, MI, USA.

出版信息

Faraday Discuss. 2025 Jan 14;256(0):614-638. doi: 10.1039/d4fd00112e.

DOI:10.1039/d4fd00112e
PMID:39351769
Abstract

Synthesis of predicted materials is the key and final step needed to realize a vision of computationally accelerated materials discovery. Because so many materials have been previously synthesized, one would anticipate that text-mining synthesis recipes from the literature would yield a valuable dataset to train machine-learning models that can predict synthesis recipes for new materials. Between 2016 and 2019, the corresponding author (Wenhao Sun) participated in efforts to text-mine 31 782 solid-state synthesis recipes and 35 675 solution-based synthesis recipes from the literature. Here, we characterize these datasets and show that they do not satisfy the "4 Vs" of data-science-that is: volume, variety, veracity and velocity. For this reason, we believe that machine-learned regression or classification models built from these datasets will have limited utility in guiding the predictive synthesis of novel materials. On the other hand, these large datasets provided an opportunity to identify anomalous synthesis recipes-which in fact did inspire new hypotheses on how materials form, which we later validated by experiment. Our case study here urges a re-evaluation on how to extract the most value from large historical materials-science datasets.

摘要

预测材料的合成是实现计算加速材料发现愿景所需的关键和最后一步。由于之前已经合成了如此多的材料,人们可能会预期从文献中挖掘合成方法会产生一个有价值的数据集,用于训练能够预测新材料合成方法的机器学习模型。在2016年至2019年期间,通讯作者(孙文豪)参与了从文献中挖掘31782个固态合成方法和35675个溶液基合成方法的工作。在此,我们对这些数据集进行了表征,并表明它们不满足数据科学的“4V”特性,即:体量、多样性、准确性和速度。因此,我们认为基于这些数据集构建的机器学习回归或分类模型在指导新型材料的预测合成方面效用有限。另一方面,这些大型数据集提供了一个识别异常合成方法的机会——事实上,这确实激发了关于材料形成方式的新假设,我们后来通过实验对这些假设进行了验证。我们在此的案例研究促使人们重新评估如何从大型历史材料科学数据集中提取最大价值。

相似文献

1
A critical reflection on attempts to machine-learn materials synthesis insights from text-mined literature recipes.对从文本挖掘的文献配方中机器学习材料合成见解的尝试的批判性反思。
Faraday Discuss. 2025 Jan 14;256(0):614-638. doi: 10.1039/d4fd00112e.
2
Text-mined dataset of inorganic materials synthesis recipes.文本挖掘的无机材料合成配方数据集。
Sci Data. 2019 Oct 15;6(1):203. doi: 10.1038/s41597-019-0224-1.
3
Precursor recommendation for inorganic synthesis by machine learning materials similarity from scientific literature.基于科学文献中机器学习材料相似性的无机合成前驱体推荐
Sci Adv. 2023 Jun 9;9(23):eadg8180. doi: 10.1126/sciadv.adg8180.
4
Plants and their uses in dermatological recipes of the Receptarium of Burkhard III von Hallwyl from 16th century Switzerland - Data mining a historical text and preliminary in vitro screening.从 16 世纪瑞士的 Burkhard III von Hallwyl 的 Receptarium 中挖掘植物及其在皮肤科配方中的用途——数据挖掘历史文本和初步的体外筛选。
J Ethnopharmacol. 2024 Dec 5;335:118633. doi: 10.1016/j.jep.2024.118633. Epub 2024 Aug 2.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
Author Correction: Text-mined dataset of inorganic materials synthesis recipes.作者更正:无机材料合成配方的文本挖掘数据集。
Sci Data. 2019 Nov 15;6(1):273. doi: 10.1038/s41597-019-0297-x.
7
An autonomous laboratory for the accelerated synthesis of novel materials.自主式实验室,用于加速新型材料的合成。
Nature. 2023 Dec;624(7990):86-91. doi: 10.1038/s41586-023-06734-w. Epub 2023 Nov 29.
8
Social Media Mining for an Analysis of Nutrition and Dietary Health in Taiwan.社交媒体挖掘在台湾营养与饮食健康分析中的应用
Nutrients. 2021 May 23;13(6):1778. doi: 10.3390/nu13061778.
9
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
10
An Exploration into Life, Body, Materials, Culture of Mediaeval East Asia: Focusing on Emergency Medicine Recipes in Local Medicinals of Koryŏ Dynasty.中世纪东亚的生命、身体、材料与文化探究:以高丽王朝地方医学中的急救医学方剂为重点
Uisahak. 2019 Apr;28(1):1-42. doi: 10.13081/kjmh.2019.28.1.

引用本文的文献

1
Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs Using Large Language Models.使用大语言模型对无机晶体多晶型物进行可解释的可合成性预测
Angew Chem Int Ed Engl. 2025 May;64(19):e202423950. doi: 10.1002/anie.202423950. Epub 2025 Mar 22.