识别机器学习模型在材料科学中的适用领域。

Identifying domains of applicability of machine learning models for materials science.

机构信息

NOMAD Laboratory, Fritz Haber Institute of the Max Planck Society, Berlin, Germany.

Faculty of IT, Monash University, Clayton, VIC 3800, Australia.

出版信息

Nat Commun. 2020 Sep 4;11(1):4428. doi: 10.1038/s41467-020-17112-9.

DOI:10.1038/s41467-020-17112-9

PMID:32887879

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7474068/

Abstract

Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.

摘要

虽然机器学习（ML）模型有望极大地加速新型材料的发现，但它们的性能通常仍不足以得出可靠的结论。因此，人们正在积极研究改进的 ML 模型，但目前主要通过监测平均模型测试误差来指导其设计。这可能会导致不同的模型无法区分，尽管它们在材料方面的性能有很大差异，或者可能会使模型看起来普遍不足，而实际上它在特定子领域表现良好。在这里，我们提出了一种基于子组发现的方法，用于检测材料类别内模型的适用域（DA）。我们通过分析三种用于预测透明导电氧化物形成能的最先进的 ML 模型来证明该方法的实用性。我们发现，尽管具有相互无法区分且不令人满意的平均误差，但这些模型具有具有独特特征且性能明显提高的适用域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b169/7474068/3e1a01b3bad5/41467_2020_17112_Fig1_HTML.jpg

相似文献

Identifying domains of applicability of machine learning models for materials science.识别机器学习模型在材料科学中的适用领域。

Nat Commun. 2020 Sep 4;11(1):4428. doi: 10.1038/s41467-020-17112-9.

Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes.基于数据驱动的血糖动力学建模与预测：机器学习在 1 型糖尿病中的应用。

Artif Intell Med. 2019 Jul;98:109-134. doi: 10.1016/j.artmed.2019.07.007. Epub 2019 Jul 26.

Data-driven machine learning model for the prediction of oxygen vacancy formation energy of metal oxide materials.用于预测金属氧化物材料氧空位形成能的数据驱动机器学习模型。

Phys Chem Chem Phys. 2021 Jul 28;23(29):15675-15684. doi: 10.1039/d1cp02066h.

A Design-to-Device Pipeline for Data-Driven Materials Discovery.数据驱动的材料发现的设计到器件的流水线。

Acc Chem Res. 2020 Mar 17;53(3):599-610. doi: 10.1021/acs.accounts.9b00470. Epub 2020 Feb 25.

The structural similarity index for IMRT quality assurance: radiomics-based error classification.用于调强放疗质量保证的结构相似性指数：基于放射组学的误差分类。

Med Phys. 2021 Jan;48(1):80-93. doi: 10.1002/mp.14559. Epub 2020 Nov 27.

Improved defined approaches for predicting skin sensitization hazard and potency in humans.改进的用于预测人类皮肤致敏危害和效力的明确方法。

ALTEX. 2019;36(3):363-372. doi: 10.14573/altex.1809191. Epub 2019 Jan 23.

Dissecting Machine-Learning Prediction of Molecular Activity: Is an Applicability Domain Needed for Quantitative Structure-Activity Relationship Models Based on Deep Neural Networks?解析机器学习对分子活性的预测：基于深度神经网络的定量构效关系模型是否需要适用域？

J Chem Inf Model. 2019 Jan 28;59(1):117-126. doi: 10.1021/acs.jcim.8b00348. Epub 2018 Nov 21.

General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。

J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.

Detecting MLC modeling errors using radiomics-based machine learning in patient-specific QA with an EPID for intensity-modulated radiation therapy.利用基于放射组学的机器学习在 EPID 用于调强放射治疗的个体化 QA 中检测 MLC 建模误差。

Med Phys. 2021 Mar;48(3):991-1002. doi: 10.1002/mp.14699. Epub 2021 Jan 27.

Rethinking Giftedness and Gifted Education: A Proposed Direction Forward Based on Psychological Science.重新思考天赋和英才教育：基于心理科学的前进方向建议。

Psychol Sci Public Interest. 2011 Jan;12(1):3-54. doi: 10.1177/1529100611418056.

引用本文的文献

Modeling Time-On-Stream Catalyst Reactivity in the Selective Hydrogenation of Concentrated Acetylene Streams under Industrial Conditions via Experiments and AI.通过实验和人工智能对工业条件下浓缩乙炔物流选择性加氢过程中随运行时间变化的催化剂反应活性进行建模。

ACS Catal. 2025 Jul 11;15(15):12652-12665. doi: 10.1021/acscatal.5c02226. eCollection 2025 Aug 1.

Artificial-intelligence-assisted design principle for developing high-performance single-atom catalysts.用于开发高性能单原子催化剂的人工智能辅助设计原则

Innovation (Camb). 2025 Apr 17;6(7):100911. doi: 10.1016/j.xinn.2025.100911. eCollection 2025 Jul 7.

Coherent collections of rules describing exceptional materials identified with a multi-objective optimization of subgroups.描述通过子群多目标优化识别出的特殊材料的连贯规则集合。

Digit Discov. 2025 Jun 25. doi: 10.1039/d5dd00174a.

Identifying Key Factors Influencing Advanced Hydrogen Evolution Reaction Catalysts.识别影响先进析氢反应催化剂的关键因素。

JACS Au. 2025 Jun 4;5(6):2762-2769. doi: 10.1021/jacsau.5c00339. eCollection 2025 Jun 23.

Enhancing energy predictions in multi-atom systems with multiscale topological learning.利用多尺度拓扑学习提高多原子系统中的能量预测

J Mater Chem A Mater. 2025 Jun 5. doi: 10.1039/d5ta02687c.

Prediction and optimization of stretch flangeability of advanced high strength steels utilizing machine learning approaches.利用机器学习方法对先进高强度钢拉伸翻边性能进行预测与优化。

Sci Rep. 2025 May 10;15(1):16296. doi: 10.1038/s41598-025-00786-w.

Intelligent design and synthesis of energy catalytic materials.能源催化材料的智能设计与合成

Fundam Res. 2023 Dec 10;5(2):624-639. doi: 10.1016/j.fmre.2023.10.012. eCollection 2025 Mar.

Band-Gap Regression with Architecture-Optimized Message-Passing Neural Networks.基于架构优化消息传递神经网络的带隙回归

Chem Mater. 2025 Feb 12;37(4):1358-1369. doi: 10.1021/acs.chemmater.4c01988. eCollection 2025 Feb 25.

Predicting electronic screening for fast Koopmans spectral functional calculations.预测用于快速库普曼斯光谱功能计算的电子筛选。

NPJ Comput Mater. 2024;10(1):299. doi: 10.1038/s41524-024-01484-3. Epub 2024 Dec 20.

UNIQUE: A Framework for Uncertainty Quantification Benchmarking.独特性：不确定性量化基准测试框架。

J Chem Inf Model. 2024 Nov 25;64(22):8379-8386. doi: 10.1021/acs.jcim.4c01578. Epub 2024 Nov 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

识别机器学习模型在材料科学中的适用领域。

Identifying domains of applicability of machine learning models for materials science.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献