NOMAD Laboratory, Fritz Haber Institute of the Max Planck Society, Berlin, Germany.
Faculty of IT, Monash University, Clayton, VIC 3800, Australia.
Nat Commun. 2020 Sep 4;11(1):4428. doi: 10.1038/s41467-020-17112-9.
Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.
虽然机器学习(ML)模型有望极大地加速新型材料的发现,但它们的性能通常仍不足以得出可靠的结论。因此,人们正在积极研究改进的 ML 模型,但目前主要通过监测平均模型测试误差来指导其设计。这可能会导致不同的模型无法区分,尽管它们在材料方面的性能有很大差异,或者可能会使模型看起来普遍不足,而实际上它在特定子领域表现良好。在这里,我们提出了一种基于子组发现的方法,用于检测材料类别内模型的适用域(DA)。我们通过分析三种用于预测透明导电氧化物形成能的最先进的 ML 模型来证明该方法的实用性。我们发现,尽管具有相互无法区分且不令人满意的平均误差,但这些模型具有具有独特特征且性能明显提高的适用域。