Suppr超能文献

固体化学中的可解释机器学习,应用于钙钛矿、尖晶石和稀土金属间化合物:使用决策树寻找描述符。

Interpretable Machine Learning in Solid-State Chemistry, with Applications to Perovskites, Spinels, and Rare-Earth Intermetallics: Finding Descriptors Using Decision Trees.

机构信息

Department of Chemistry, University of Alberta, Edmonton, Alberta T6G 2G2, Canada.

Department of Chemistry and Biochemistry, Manhattan College, Riverdale, New York 10471, United States.

出版信息

Inorg Chem. 2023 Jul 17;62(28):10865-10875. doi: 10.1021/acs.inorgchem.3c01153. Epub 2023 Jun 30.

Abstract

Machine-learning methods have exciting potential to aid materials discovery, but their wider adoption can be hindered by the opaqueness of many models. Even if these models are accurate, the inability to understand the basis for the predictions breeds skepticism. Thus, it is imperative to develop machine-learning models that are explainable and interpretable so that researchers can judge for themselves if the predictions are consistent with their own scientific understanding and chemical insight. In this spirit, the sure independence screening and sparsifying operator (SISSO) method was recently proposed as an effective way to identify the simplest combination of chemical descriptors needed to solve classification and regression problems in materials science. This approach uses domain overlap (DO) as the criterion to find the most informative descriptors in classification problems, but sometimes a low score can be assigned to useful descriptors when there are outliers or when samples belonging to a class are clustered in different regions of the feature space. Here, we present a hypothesis that the performance can be improved by implementing decision trees (DT) instead of DO as the scoring function to find the best descriptors. This modified approach was tested on three important structural classification problems in solid-state chemistry: perovskites, spinels, and rare-earth intermetallics. In all cases, the DT scoring gave better features and significantly improved accuracies of ≥0.91 for the training sets and ≥0.86 for the test sets.

摘要

机器学习方法在辅助材料发现方面具有令人兴奋的潜力,但由于许多模型的不透明性,它们的广泛采用可能会受到阻碍。即使这些模型是准确的,无法理解预测的基础也会滋生怀疑。因此,开发可解释和可理解的机器学习模型是当务之急,以便研究人员可以自行判断预测是否与他们自己的科学理解和化学洞察力一致。本着这种精神,最近提出了 sure independence screening and sparsifying operator (SISSO) 方法,作为一种有效识别解决材料科学中分类和回归问题所需的最简单化学描述符组合的方法。该方法使用域重叠 (DO) 作为标准,在分类问题中找到最具信息量的描述符,但有时当存在异常值或属于某一类的样本在特征空间的不同区域聚类时,可能会给有用的描述符分配较低的分数。在这里,我们提出了一个假设,即通过实现决策树 (DT) 而不是 DO 作为评分函数来找到最佳描述符,可以提高性能。该改进方法在固态化学中的三个重要结构分类问题上进行了测试:钙钛矿、尖晶石和稀土金属间化合物。在所有情况下,DT 评分都提供了更好的特征,并且显著提高了训练集的准确性(≥0.91)和测试集的准确性(≥0.86)。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验