Reddy G Sekhar, Chittineni Suneetha
Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, Andhra Pradesh, India.
Department of Computer Applications, RVR&JC college of Engineering, Guntur, Andhra Pradesh, India.
PeerJ Comput Sci. 2021 Apr 7;7:e424. doi: 10.7717/peerj-cs.424. eCollection 2021.
Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.
信息效率在信息技术的开发和应用领域正变得越来越重要。数据挖掘是一个计算机辅助的海量数据调查过程,它从数据集中提取有意义的信息。挖掘出的信息用于决策,以了解每个属性的行为。因此,本文引入了一种新的分类算法来改进信息管理。经典的C4.5决策树方法与自私羊群优化(SHO)算法相结合,以调整给定数据集的增益。基于SHO更新信息增益的最优权重。此外,基于二次熵计算和信息增益将数据集划分为两类。决策树增益优化是我们提出的C4.5-SHO方法的主要目标。在各种数据集上评估了所提方法的稳健性,并与ID3和CART等分类器进行了比较。估计了接收器操作特征曲线参数下的准确性和面积,并与蚁群优化、粒子群优化和布谷鸟搜索等现有算法进行了比较。