• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

数据驱动的材料发现的设计到器件的流水线。

A Design-to-Device Pipeline for Data-Driven Materials Discovery.

机构信息

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge CB3 0HE, U.K.

ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0QX, U.K.

出版信息

Acc Chem Res. 2020 Mar 17;53(3):599-610. doi: 10.1021/acs.accounts.9b00470. Epub 2020 Feb 25.

DOI:10.1021/acs.accounts.9b00470
PMID:32096410
Abstract

The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times.This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application.This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.

摘要

世界需要新材料来刺激我们经济的关键领域中的化学工业

环境和可持续性、信息存储、光通信和催化。然而,几乎所有的功能材料仍然是通过“试错法”发现的,这种缺乏可预测性的方法给技术创新带来了主要的材料瓶颈。目前,材料发现的平均“从分子到市场”的前置时间为 20 年。对于工业需求来说,这太长了,正如材料基因组倡议所强调的那样,该倡议的目标是将平均“从分子到市场”的前置时间减少多达 4 倍。如果采用全新的材料发现方法,才能实现如此大的进展。幸运的是,一种全新的材料发现方法已经出现,即数据科学与人工智能相结合,为加速这些平均“从分子到市场”的前置时间提供了一个有前景的解决方案。这种方法被称为数据驱动的材料发现。由于“大数据”、人工智能和高性能计算(HPC)的及时和重大进展,这种方法的广阔前景才刚刚成为现实。政府监管的对数据和文献的开放获取要求刺激了对大规模数据集的访问。可以挖掘数据并发现其中模式的自然语言处理(NLP)和机器学习(ML)工具正在成为主流。现在我们已经掌握了能够帮助数据挖掘和模式识别以及从计算中生成自己的数据的 Exascale HPC 能力。这些及时的进展为开发数据驱动的材料发现策略提供了理想的机会,以系统地设计和预测给定器件应用的新化学物质。本账户展示了数据科学如何通过一个四步的“从设计到器件”的管道来实现材料发现,该管道包括(1)数据提取,(2)数据丰富,(3)材料预测,和(4)实验验证。首先从“化学感知”自然语言处理工具(如 ChemDataExtractor)中锻造出同源化学和属性信息的海量数据库,并使用机器学习方法和高通量量子化学计算对其进行丰富。然后,通过挖掘已知提供功能材料的化学结构和物理性质之间关系的算法编码,从这些数据库中预测出定制应用的新材料。这些可能采用分类、枚举或机器学习算法的形式。数据挖掘工作流程将这些预测筛选到少数几个领先的候选材料,这些材料将进入实验验证阶段。这种从设计到器件的方法正在被开发出来,为功能应用的新材料的加速发现提供了一个路线图。呈现的案例研究证明了它在光伏、光学和催化应用中的实用性。虽然本账户侧重于物理科学中的应用,但所讨论的通用管道很容易转移到生物学和医学等其他科学学科。

相似文献

1
A Design-to-Device Pipeline for Data-Driven Materials Discovery.数据驱动的材料发现的设计到器件的流水线。
Acc Chem Res. 2020 Mar 17;53(3):599-610. doi: 10.1021/acs.accounts.9b00470. Epub 2020 Feb 25.
2
Proceedings of the Second Workshop on Theory meets Industry (Erwin-Schrödinger-Institute (ESI), Vienna, Austria, 12-14 June 2007).第二届理论与产业研讨会会议录(2007年6月12日至14日,奥地利维也纳埃尔温·薛定谔研究所)
J Phys Condens Matter. 2008 Feb 13;20(6):060301. doi: 10.1088/0953-8984/20/06/060301. Epub 2008 Jan 24.
3
Discovery of Intermetallic Compounds from Traditional to Machine-Learning Approaches.从传统方法到机器学习方法发现金属间化合物。
Acc Chem Res. 2018 Jan 16;51(1):59-68. doi: 10.1021/acs.accounts.7b00490. Epub 2017 Dec 15.
4
Data-Driven Strategies for Accelerated Materials Design.数据驱动的材料设计加速策略。
Acc Chem Res. 2021 Feb 16;54(4):849-860. doi: 10.1021/acs.accounts.0c00785. Epub 2021 Feb 2.
5
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
6
Engineering Aspects of Olfaction嗅觉的工程学方面
7
Snowball 2.0: Generic Material Data Parser for ChemDataExtractor.雪球 2.0:ChemDataExtractor 的通用物质数据解析器。
J Chem Inf Model. 2023 Nov 27;63(22):7045-7055. doi: 10.1021/acs.jcim.3c01281. Epub 2023 Nov 7.
8
BatteryDataExtractor: battery-aware text-mining software embedded with BERT models.电池数据提取器:嵌入BERT模型的电池感知文本挖掘软件。
Chem Sci. 2022 Sep 23;13(39):11487-11495. doi: 10.1039/d2sc04322j. eCollection 2022 Oct 12.
9
PDFDataExtractor: A Tool for Reading Scientific Text and Interpreting Metadata from the Typeset Literature in the Portable Document Format.PDFDataExtractor:一种从可移植文档格式中的排版文献中读取科学文本和解释元数据的工具。
J Chem Inf Model. 2022 Apr 11;62(7):1633-1643. doi: 10.1021/acs.jcim.1c01198. Epub 2022 Mar 29.
10
Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab.自主化学实验:建立自动驾驶实验室的挑战与展望。
Acc Chem Res. 2022 Sep 6;55(17):2454-2466. doi: 10.1021/acs.accounts.2c00220. Epub 2022 Aug 10.

引用本文的文献

1
MechBERT: Language Models for Extracting Chemical and Property Relationships about Mechanical Stress and Strain.MechBERT:用于提取关于机械应力和应变的化学与性质关系的语言模型。
J Chem Inf Model. 2025 Feb 24;65(4):1873-1888. doi: 10.1021/acs.jcim.4c00857. Epub 2025 Jan 31.
2
MaTableGPT: GPT-Based Table Data Extractor from Materials Science Literature.MaTableGPT:基于GPT的材料科学文献表格数据提取器。
Adv Sci (Weinh). 2025 Apr;12(16):e2408221. doi: 10.1002/advs.202408221. Epub 2025 Jan 24.
3
A Database of Stress-Strain Properties Auto-generated from the Scientific Literature using ChemDataExtractor.
一个使用ChemDataExtractor从科学文献中自动生成的应力-应变特性数据库。
Sci Data. 2024 Nov 23;11(1):1273. doi: 10.1038/s41597-024-03979-6.
4
Predictive Modeling of High-Entropy Alloys and Amorphous Metallic Alloys Using Machine Learning.使用机器学习对高熵合金和非晶态金属合金进行预测建模。
J Chem Inf Model. 2024 Oct 14;64(19):7313-7336. doi: 10.1021/acs.jcim.4c00873. Epub 2024 Oct 1.
5
Toward an AI Era: Advances in Electronic Skins.迈向人工智能时代:电子皮肤的进展。
Chem Rev. 2024 Sep 11;124(17):9899-9948. doi: 10.1021/acs.chemrev.4c00049. Epub 2024 Aug 28.
6
Kriging-based surrogate data-enriching artificial neural network prediction of strength and permeability of permeable cement-stabilized base.基于克里金法的替代数据丰富人工神经网络对透水性水泥稳定基层强度和渗透性的预测
Nat Commun. 2024 Jun 7;15(1):4891. doi: 10.1038/s41467-024-48766-4.
7
Informatics-Driven Design of Superhard B-C-O Compounds.基于信息学的超硬B-C-O化合物设计
ACS Appl Mater Interfaces. 2024 Feb 28;16(8):10372-10379. doi: 10.1021/acsami.3c18105. Epub 2024 Feb 17.
8
Automatically Generated Datasets: Present and Potential Self-Cleaning Coating Materials.自动生成数据集:现有及潜在的自清洁涂层材料。
Sci Data. 2024 Jan 31;11(1):146. doi: 10.1038/s41597-024-02983-0.
9
A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor.利用 ChemDataExtractor 从科学文献中自动生成热激活延迟荧光分子数据库。
Sci Data. 2024 Jan 17;11(1):80. doi: 10.1038/s41597-023-02897-3.
10
New Insights on Designing the Next-Generation Materials for Electrochemical Synthesis of Reactive Oxidative Species Towards Efficient and Scalable Water Treatment: A Review and Perspectives.用于电化学合成活性氧化物种以实现高效且可扩展水处理的下一代材料设计新见解:综述与展望
J Environ Chem Eng. 2023 Dec;11(6). doi: 10.1016/j.jece.2023.111384. Epub 2023 Nov 3.