固体化学中的可解释机器学习，应用于钙钛矿、尖晶石和稀土金属间化合物：使用决策树寻找描述符。

Interpretable Machine Learning in Solid-State Chemistry, with Applications to Perovskites, Spinels, and Rare-Earth Intermetallics: Finding Descriptors Using Decision Trees.

机构信息

Department of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada.

Department of Chemistry and Biochemistry, Manhattan College, Riverdale, New York 10471, United States.

出版信息

Inorg Chem. 2023 Jul 17;62(28):10865-10875. doi: 10.1021/acs.inorgchem.3c01153. Epub 2023 Jun 30.

DOI:10.1021/acs.inorgchem.3c01153

PMID:37390482

Abstract

Machine-learning methods have exciting potential to aid materials discovery, but their wider adoption can be hindered by the opaqueness of many models. Even if these models are accurate, the inability to understand the basis for the predictions breeds skepticism. Thus, it is imperative to develop machine-learning models that are explainable and interpretable so that researchers can judge for themselves if the predictions are consistent with their own scientific understanding and chemical insight. In this spirit, the sure independence screening and sparsifying operator (SISSO) method was recently proposed as an effective way to identify the simplest combination of chemical descriptors needed to solve classification and regression problems in materials science. This approach uses domain overlap (DO) as the criterion to find the most informative descriptors in classification problems, but sometimes a low score can be assigned to useful descriptors when there are outliers or when samples belonging to a class are clustered in different regions of the feature space. Here, we present a hypothesis that the performance can be improved by implementing decision trees (DT) instead of DO as the scoring function to find the best descriptors. This modified approach was tested on three important structural classification problems in solid-state chemistry: perovskites, spinels, and rare-earth intermetallics. In all cases, the DT scoring gave better features and significantly improved accuracies of ≥0.91 for the training sets and ≥0.86 for the test sets.

摘要

机器学习方法在辅助材料发现方面具有令人兴奋的潜力，但由于许多模型的不透明性，它们的广泛采用可能会受到阻碍。即使这些模型是准确的，无法理解预测的基础也会滋生怀疑。因此，开发可解释和可理解的机器学习模型是当务之急，以便研究人员可以自行判断预测是否与他们自己的科学理解和化学洞察力一致。本着这种精神，最近提出了 sure independence screening and sparsifying operator (SISSO) 方法，作为一种有效识别解决材料科学中分类和回归问题所需的最简单化学描述符组合的方法。该方法使用域重叠 (DO) 作为标准，在分类问题中找到最具信息量的描述符，但有时当存在异常值或属于某一类的样本在特征空间的不同区域聚类时，可能会给有用的描述符分配较低的分数。在这里，我们提出了一个假设，即通过实现决策树 (DT) 而不是 DO 作为评分函数来找到最佳描述符，可以提高性能。该改进方法在固态化学中的三个重要结构分类问题上进行了测试：钙钛矿、尖晶石和稀土金属间化合物。在所有情况下，DT 评分都提供了更好的特征，并且显著提高了训练集的准确性（≥0.91）和测试集的准确性（≥0.86）。

相似文献

Interpretable Machine Learning in Solid-State Chemistry, with Applications to Perovskites, Spinels, and Rare-Earth Intermetallics: Finding Descriptors Using Decision Trees.固体化学中的可解释机器学习，应用于钙钛矿、尖晶石和稀土金属间化合物：使用决策树寻找描述符。

Inorg Chem. 2023 Jul 17;62(28):10865-10875. doi: 10.1021/acs.inorgchem.3c01153. Epub 2023 Jun 30.

Recent advances in the SISSO method and their implementation in the SISSO++ code.SISSO方法的最新进展及其在SISSO++代码中的实现。

J Chem Phys. 2023 Sep 21;159(11). doi: 10.1063/5.0156620.

Adsorption Enthalpies for Catalysis Modeling through Machine-Learned Descriptors.通过机器学习描述符进行催化建模的吸附焓。

Acc Chem Res. 2021 Jun 15;54(12):2741-2749. doi: 10.1021/acs.accounts.1c00153. Epub 2021 Jun 3.

Search for ABO Type Ferroelectric Perovskites with Targeted Multi-Properties by Machine Learning Strategies.通过机器学习策略寻找具有靶向多性能的ABO型铁电钙钛矿。

J Chem Inf Model. 2022 Nov 14;62(21):5038-5049. doi: 10.1021/acs.jcim.1c00566. Epub 2021 Aug 10.

Feature-Assisted Machine Learning for Predicting Band Gaps of Binary Semiconductors.用于预测二元半导体带隙的特征辅助机器学习

Nanomaterials (Basel). 2024 Feb 28;14(5):445. doi: 10.3390/nano14050445.

Prediction and Classification of Formation Energies of Binary Compounds by Machine Learning: An Approach without Crystal Structure Information.基于机器学习的二元化合物形成能预测与分类：一种无需晶体结构信息的方法

ACS Omega. 2021 May 26;6(22):14533-14541. doi: 10.1021/acsomega.1c01517. eCollection 2021 Jun 8.

New tolerance factor to predict the stability of perovskite oxides and halides.用于预测钙钛矿氧化物和卤化物稳定性的新容忍因子。

Sci Adv. 2019 Feb 8;5(2):eaav0693. doi: 10.1126/sciadv.aav0693. eCollection 2019 Feb.

Machine learning descriptors in materials chemistry used in multiple experimentally validated studies: Oliynyk elemental property dataset.材料化学中用于多项实验验证研究的机器学习描述符：奥利尼克元素性质数据集。

Data Brief. 2024 Feb 9;53:110178. doi: 10.1016/j.dib.2024.110178. eCollection 2024 Apr.

Discovery of Intermetallic Compounds from Traditional to Machine-Learning Approaches.从传统方法到机器学习方法发现金属间化合物。

Acc Chem Res. 2018 Jan 16;51(1):59-68. doi: 10.1021/acs.accounts.7b00490. Epub 2017 Dec 15.

Enabling interpretable machine learning for biological data with reliability scores.利用可靠性评分实现生物数据的可解释机器学习。

PLoS Comput Biol. 2023 May 26;19(5):e1011175. doi: 10.1371/journal.pcbi.1011175. eCollection 2023 May.

引用本文的文献

Structural chemistry of intermetallic compounds for active site design in heterogeneous catalysis.用于多相催化中活性位点设计的金属间化合物的结构化学

Chem Sci. 2025 Apr 21;16(20):8611-8636. doi: 10.1039/d5sc01810b. eCollection 2025 May 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

固体化学中的可解释机器学习，应用于钙钛矿、尖晶石和稀土金属间化合物：使用决策树寻找描述符。

Interpretable Machine Learning in Solid-State Chemistry, with Applications to Perovskites, Spinels, and Rare-Earth Intermetallics: Finding Descriptors Using Decision Trees.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献