使用张量嵌入对多组分系统中化学相关成分进行数据驱动的预测。

Data-driven prediction of chemically relevant compositions in multi-component systems using tensor embeddings.

作者信息

Hayashi Hiroyuki, Tanaka Isao

机构信息

Department of Materials Science and Engineering, Kyoto University, Sakyo, Kyoto, 606-8501, Japan.

Nanostructures Research Laboratory, Japan Fine Ceramics Center, Nagoya, 456-8587, Japan.

出版信息

Sci Rep. 2025 Jan 9;15(1):1448. doi: 10.1038/s41598-024-85062-z.

DOI:10.1038/s41598-024-85062-z

PMID:39789059

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11718299/

Abstract

The discovery of novel materials is crucial for developing new functional materials. This study introduces a predictive model designed to forecast complex multi-component oxide compositions, leveraging data derived from simpler pseudo-binary systems. By applying tensor decomposition and machine learning techniques, we transformed pseudo-binary oxide compositions from the Inorganic Crystal Structure Database (ICSD) into tensor representations, capturing key chemical trends such as oxidation states and periodic positions. Tucker decomposition was utilized to extract tensor embeddings, which were used to train a Random Forest classifier. The model successfully predicted the existence probabilities of pseudo-ternary and quaternary oxides, with 84% and 52% of ICSD-registered compositions, respectively, achieving high scores. Our approach highlights the potential of leveraging simpler oxide data to predict more complex compositions, suggesting broader applicability to other material systems such as sulfides and nitrides.

摘要

新型材料的发现对于开发新型功能材料至关重要。本研究引入了一种预测模型，旨在利用来自更简单的伪二元系统的数据预测复杂的多组分氧化物组成。通过应用张量分解和机器学习技术，我们将无机晶体结构数据库（ICSD）中的伪二元氧化物组成转换为张量表示，捕捉了诸如氧化态和周期位置等关键化学趋势。利用塔克分解提取张量嵌入，用于训练随机森林分类器。该模型成功预测了伪三元和伪四元氧化物的存在概率，分别有84%和52%的ICSD注册组成获得高分。我们的方法突出了利用更简单的氧化物数据预测更复杂组成的潜力，表明对硫化物和氮化物等其他材料系统具有更广泛的适用性。