Prasetiyowati Maria Irmina, Maulidevi Nur Ulfa, Surendro Kridanto
Doctoral Program of Electrical Engineering and Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Jawa Barat, Indonesia.
Department of Electrical Engineering and Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Jawa Barat, Indonesia.
PeerJ Comput Sci. 2022 Jul 14;8:e1041. doi: 10.7717/peerj-cs.1041. eCollection 2022.
One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.
构建模型的一个重要目的是通过特征选择过程在更短的时间内提高其准确性。这是通过使用信息增益(IG)确定数据集中可用特征的重要性来实现的。该过程用于计算所选高值特征中包含的信息量,以加速算法的性能。在选择信息性特征时,信息增益(IG)使用一个阈值(截止值)。因此,本研究旨在通过整合IG、快速傅里叶变换(FFT)和合成少数过采样技术(SMOTE)方法来确定改进特征选择所需的时间和准确性性能。然后将特征选择模型应用于随机森林,这是一种具有随机特征选择的基于树的机器学习算法。总共使用了八个数据集,其中包括三个平衡数据集和五个不平衡数据集来进行这项研究。此外,在不平衡数据集中发现的SMOTE用于平衡数据。结果表明,使用信息增益、FFT和SMOTE进行特征选择提高了随机森林的性能准确性。