Marchenko E I, Korolev V V, Kobeleva E A, Belich N A, Udalova N N, Eremin N N, Goodilin E A, Tarasov A B
Laboratory of New Materials for Solar Energetics, Department of Materials Science, Lomonosov Moscow State University, 1 Lenin Hills, 119991, Moscow, Russia.
Department of Geology, Lomonosov Moscow State University, 1 Lenin Hills, 119991, Moscow, Russia.
Nanoscale. 2025 Jan 29;17(5):2742-2752. doi: 10.1039/d4nr04531a.
Identification of crystal structures is a crucial stage in the exploration of novel functional materials. This procedure is usually time-consuming and can be false-positive or false-negative. This necessitates a significant level of expert proficiency in the field of crystallography and, especially, requires deep experience in perovskite-related structures of hybrid perovskites. Our work is devoted to the machine learning classification of structure types of hybrid lead halides based on available X-ray diffraction data. Here, we proposed a simple approach for quickly identifying the dimensionality of inorganic substructures, types of connections of lead halide polyhedra and structure types using common powder XRD data and a ML-decision tree classification model. The average accuracy of our ML algorithm in predicting the dimensionality of inorganic substructures, the type of connection of lead halide and inorganic substructure topology based on theoretically calculated XRD patterns among 14 most common structure types reached 0.76 ± 0.07, 0.827 ± 0.028 and 0.71 ± 0.05, respectively. To test the transferability of the developed ML model, we expanded our dataset to 30 structure types. The average accuracy of our ML algorithm in predicting the dimensionality of inorganic substructures, the type of connection of lead halide and inorganic substructure topology based on theoretically calculated XRD patterns among 30 structure types reached 0.820 ± 0.022, 0.74 ± 0.05 and 0.633 ± 0.018, respectively. The validation of our decision tree classification ML model on experimental XRD data shows accuracies of 1.0 and 0.82 for dimension and structure type prediction. Thus, our approach can significantly simplify and accelerate the interpretation of highly complicated XRD data for hybrid lead halides.
晶体结构的识别是新型功能材料探索中的关键阶段。这个过程通常很耗时,而且可能出现假阳性或假阴性。这就需要在晶体学领域具备相当水平的专家技能,特别是需要在杂化钙钛矿的钙钛矿相关结构方面有深入经验。我们的工作致力于基于可用的X射线衍射数据对杂化卤化铅的结构类型进行机器学习分类。在此,我们提出了一种简单方法,可利用普通粉末XRD数据和机器学习决策树分类模型快速识别无机子结构的维度、卤化铅多面体的连接类型以及结构类型。我们的机器学习算法基于14种最常见结构类型的理论计算XRD图谱预测无机子结构维度、卤化铅连接类型和无机子结构拓扑结构的平均准确率分别达到0.76±0.07、0.827±0.028和0.71±0.05。为了测试所开发机器学习模型的可转移性,我们将数据集扩展到30种结构类型。我们的机器学习算法基于30种结构类型的理论计算XRD图谱预测无机子结构维度、卤化铅连接类型和无机子结构拓扑结构的平均准确率分别达到0.820±0.022、0.74±0.05和0.633±0.018。我们的决策树分类机器学习模型在实验XRD数据上的验证表明,维度和结构类型预测的准确率分别为1.0和0.82。因此,我们的方法可以显著简化和加速对杂化卤化铅高度复杂XRD数据的解读。