Huo Sitong, Zhang Shuqing, Wu Qilin, Zhang Xinping
Institute of Information Photonics Technology, School of Physics and Optoelectronic Engineering, Beijing University of Technology, Beijing 100124, China.
Nanomaterials (Basel). 2024 Feb 28;14(5):445. doi: 10.3390/nano14050445.
The band gap is a key parameter in semiconductor materials that is essential for advancing optoelectronic device development. Accurately predicting band gaps of materials at low cost is a significant challenge in materials science. Although many machine learning (ML) models for band gap prediction already exist, they often suffer from low interpretability and lack theoretical support from a physical perspective. In this study, we address these challenges by using a combination of traditional ML algorithms and the 'white-box' sure independence screening and sparsifying operator (SISSO) approach. Specifically, we enhance the interpretability and accuracy of band gap predictions for binary semiconductors by integrating the importance rankings of support vector regression (SVR), random forests (RF), and gradient boosting decision trees (GBDT) with SISSO models. Our model uses only the intrinsic features of the constituent elements and their band gaps calculated using the Perdew-Burke-Ernzerhof method, significantly reducing computational demands. We have applied our model to predict the band gaps of 1208 theoretically stable binary compounds. Importantly, the model highlights the critical role of electronegativity in determining material band gaps. This insight not only enriches our understanding of the physical principles underlying band gap prediction but also underscores the potential of our approach in guiding the synthesis of new and valuable semiconductor materials.
带隙是半导体材料中的一个关键参数,对于推进光电器件的发展至关重要。以低成本准确预测材料的带隙是材料科学中的一项重大挑战。尽管已经存在许多用于带隙预测的机器学习(ML)模型,但它们往往缺乏可解释性,并且从物理角度缺乏理论支持。在本研究中,我们通过结合传统的ML算法和“白盒”确定性独立筛选和稀疏化算子(SISSO)方法来应对这些挑战。具体而言,我们通过将支持向量回归(SVR)、随机森林(RF)和梯度提升决策树(GBDT)的重要性排名与SISSO模型相结合,提高了二元半导体带隙预测的可解释性和准确性。我们的模型仅使用组成元素的固有特征及其使用Perdew-Burke-Ernzerhof方法计算的带隙,显著降低了计算需求。我们已将我们的模型应用于预测1208种理论上稳定的二元化合物的带隙。重要的是,该模型突出了电负性在确定材料带隙中的关键作用。这一见解不仅丰富了我们对带隙预测背后物理原理的理解,也强调了我们的方法在指导新型有价值半导体材料合成方面的潜力。