Kim Eun Ho, Gu Jun Hyeong, Lee June Ho, Kim Seong Hun, Kim Jaeseon, Shin Hyo Gyeong, Kim Shin Hyun, Lee Donghwa
Department of Materials Science and Engineering (MSE), and Division of Advanced Materials Science (AMS), Pohang University of Science and Technology (POSTECH), Pohang 37673, South Korea.
Institute for Convergence Research and Education in Advanced Technology (I_CREATE), Yonsei University, Incheon 21983, South Korea.
ACS Appl Mater Interfaces. 2024 Aug 21;16(33):43734-43741. doi: 10.1021/acsami.4c07851. Epub 2024 Aug 9.
Applying machine-learning techniques for imbalanced data sets presents a significant challenge in materials science since the underrepresented characteristics of minority classes are often buried by the abundance of unrelated characteristics in majority of classes. Existing approaches to address this focus on balancing the counts of each class using oversampling or synthetic data generation techniques. However, these methods can lead to loss of valuable information or overfitting. Here, we introduce a deep learning framework to predict minority-class materials, specifically within the realm of metal-insulator transition (MIT) materials. The proposed approach, termed boosting-CGCNN, combines the crystal graph convolutional neural network (CGCNN) model with a gradient-boosting algorithm. The model effectively handled extreme class imbalances in MIT material data by sequentially building a deeper neural network. The comparative evaluations demonstrated the superior performance of the proposed model compared to other approaches. Our approach is a promising solution for handling imbalanced data sets in materials science.