Zou Hao, Zhao Haochen, Lu Mingming, Wang Jiong, Deng Zeyu, Wang Jianxin
School of Computer Science and Engineering, Central South University, Changsha, China.
Xiangjiang Laboratory, Changsha, China.
Nat Commun. 2025 Jan 2;16(1):203. doi: 10.1038/s41467-024-55525-y.
Machine learning offers a promising avenue for expediting the discovery of new compounds by accurately predicting their thermodynamic stability. This approach provides significant advantages in terms of time and resource efficiency compared to traditional experimental and modeling methods. However, most existing models are constructed based on specific domain knowledge, potentially introducing biases that impact their performance. Here, we propose a machine learning framework rooted in electron configuration, further enhanced through stack generalization with two additional models grounded in diverse domain knowledge. Experimental results validate the efficacy of our model in accurately predicting the stability of compounds, achieving an Area Under the Curve score of 0.988. Notably, our model demonstrates exceptional efficiency in sample utilization, requiring only one-seventh of the data used by existing models to achieve the same performance. To underscore the versatility of our approach, we present three illustrative examples showcasing its effectiveness in navigating unexplored composition space. We present two case studies to demonstrate that our method can facilitate the exploration of new two-dimensional wide bandgap semiconductors and double perovskite oxides. Validation results from first-principles calculations indicate that our method demonstrates remarkable accuracy in correctly identifying stable compounds.
机器学习为通过准确预测新化合物的热力学稳定性来加速其发现提供了一条很有前景的途径。与传统的实验和建模方法相比,这种方法在时间和资源效率方面具有显著优势。然而,大多数现有模型是基于特定领域知识构建的,这可能会引入影响其性能的偏差。在此,我们提出了一个基于电子构型的机器学习框架,并通过与另外两个基于不同领域知识的模型进行堆叠泛化进一步增强。实验结果验证了我们的模型在准确预测化合物稳定性方面的有效性,曲线下面积得分达到0.988。值得注意的是,我们的模型在样本利用方面表现出卓越的效率,只需现有模型使用的数据的七分之一就能达到相同的性能。为了强调我们方法的通用性,我们给出了三个示例,展示其在探索未开发的成分空间方面的有效性。我们给出了两个案例研究,以证明我们的方法可以促进对新型二维宽带隙半导体和双钙钛矿氧化物的探索。第一性原理计算的验证结果表明,我们的方法在正确识别稳定化合物方面具有显著的准确性。