Department of Materials Science and Engineering, University of California, Berkeley, CA, 94720, USA.
Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
Sci Data. 2022 May 25;9(1):231. doi: 10.1038/s41597-022-01317-2.
The development of a materials synthesis route is usually based on heuristics and experience. A possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials. However, this route is impeded by the lack of a large-scale database of synthesis formulations. In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution-based synthesis procedures extracted from the scientific literature. Each procedure contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes. Every procedure is also augmented with the reaction formula. Through this work, we are making freely available the first large dataset of solution-based inorganic materials synthesis procedures.
材料合成路线的发展通常基于启发式和经验。一种可能的新方法是应用数据驱动的方法从过去的经验中学习合成模式,并利用这些模式来预测新型材料的合成。然而,由于缺乏大规模的合成配方数据库,这种方法受到了阻碍。在这项工作中,我们应用了先进的机器学习和自然语言处理技术,从科学文献中构建了一个包含 35675 个基于溶液的合成程序的数据集。每个程序都包含基本的合成信息,包括前体和目标材料、它们的数量以及合成操作和相应的属性。每个程序还都添加了反应式。通过这项工作,我们首次提供了包含溶液法无机材料合成程序的大型数据集。