Foppa Lucas, Scheffler Matthias
The NOMAD Laboratory at the Fritz Haber Institute of the Max Planck Society Faradayweg 4-6 D-14195 Berlin Germany
Digit Discov. 2025 Jun 25. doi: 10.1039/d5dd00174a.
Useful materials are often statistically exceptional and they might be overlooked by artificial intelligence (AI) models that attempt to describe all materials simultaneously. These global models perform well for the majority of materials, but they do not necessarily capture the useful ones. Subgroup discovery (SGD) identifies descriptions of subsets of materials associated with exceptional values of a chosen property. Thus, SGD can better capture exceptional materials compared to widely used AI techniques. Previous studies focused on the SG that maximizes an objective function establishing a tradeoff between the SG size and the exceptionality of the distribution of property values within the SG. However, this optimization does not give a unique solution, but many SGs typically have similar objective-function values. Here, we identify a "Pareto region" of SGD solutions presenting a multitude of size-exceptionality tradeoffs. The approach is demonstrated by learning descriptions of perovskites with a high bulk modulus.
有用的材料在统计上往往是特殊的,它们可能会被试图同时描述所有材料的人工智能(AI)模型忽略。这些全局模型对大多数材料表现良好,但它们不一定能捕捉到有用的材料。子群发现(SGD)识别与所选属性的异常值相关的材料子集的描述。因此,与广泛使用的AI技术相比,SGD能够更好地捕捉特殊材料。先前的研究集中在最大化目标函数的子群上,该目标函数在子群大小和子群内属性值分布的异常性之间建立了权衡。然而,这种优化并没有给出唯一的解决方案,而是许多子群通常具有相似的目标函数值。在这里,我们识别了SGD解决方案的一个“帕累托区域”,它呈现了多种大小-异常性权衡。通过学习具有高体积模量的钙钛矿的描述来证明该方法。