Xu Yibin, Wu Yen-Ju, Li Huiping, Fang Lei, Hayashi Shigenobu, Oishi Ayako, Shimizu Natsuko, Caputo Riccarda, Villars Pierre
Center for Basic Research on Materials, National Institute for Materials Science, Tsukuba, Japan.
Research Center for Energy and Environmental Materials, National Institute for Materials Science, Tsukuba, Japan.
Sci Technol Adv Mater. 2024 Sep 11;25(1):2403328. doi: 10.1080/14686996.2024.2403328. eCollection 2024.
Data-driven material research for property prediction and material design using machine learning methods requires a large quantity, wide variety, and high-quality materials data. For battery materials, which are commonly polycrystalline, ceramics, and composites, multiscale data on substances, materials, and batteries are required. In this work, we develop a data network composed of three interlinked databases, from which we can obtain comprehensive data on substances such as crystal structures and electronic structures, data on materials such as chemical composition, structure, and properties, and data on batteries such as battery composition, operation conditions, and capacity. The data are extracted from research papers on solid electrolytes and cathode materials, selected by screening more than 330 thousand papers using natural language processing tools. Data extraction and curation are carried out by editors specialized in material science and trained in data standardization.
使用机器学习方法进行数据驱动的材料研究以预测性能和进行材料设计,需要大量、多样且高质量的材料数据。对于通常为多晶、陶瓷和复合材料的电池材料,需要有关物质、材料和电池的多尺度数据。在这项工作中,我们开发了一个由三个相互关联的数据库组成的数据网络,从中我们可以获得关于物质的综合数据,如晶体结构和电子结构;关于材料的数据,如化学成分、结构和性能;以及关于电池的数据,如电池组成、运行条件和容量。这些数据是从关于固体电解质和阴极材料的研究论文中提取的,通过使用自然语言处理工具筛选超过33万篇论文来选定。数据提取和整理由专门从事材料科学并经过数据标准化培训的编辑进行。