Mostafa Reham R, El-Attar Noha E, Sabbeh Sahar F, Vidyarthi Ankit, Hashim Fatma A
Information Systems Department, Faculty of Computers and Information Sciences, Mansoura University, Mansoura, 35516 Egypt.
Faculty of Computers and Artificial Intelligence, Benha University, Banha, Egypt.
Soft comput. 2022 May 9:1-29. doi: 10.1007/s00500-022-07115-7.
The rapid growth of data generated by several applications like engineering, biotechnology, energy, and others has become a crucial challenge in the high dimensional data mining. The large amounts of data, especially those with high dimensions, may contain many irrelevant, redundant, or noisy features, which may negatively affect the accuracy and efficiency of the industrial data mining process. Recently, several meta-heuristic optimization algorithms have been utilized to evolve feature selection techniques for dealing with the vast dimensionality problem. Despite optimization algorithms' ability to find the near-optimal feature subset of the search space, they still face some global optimization challenges. This paper proposes an improved version of the sooty tern optimization (ST) algorithm, namely the ST-AL method, to improve the search performance for high-dimensional industrial optimization problems. ST-AL method is developed by boosting the performance of STOA by applying four strategies. The first strategy is the use of a control randomization parameters that ensure the balance between the exploration-exploitation stages during the search process; moreover, it avoids falling into local optimums. The second strategy entails the creation of a new exploration phase based on the Ant lion (AL) algorithm. The third strategy is improving the STOA exploitation phase by modifying the main equation of position updating. Finally, the greedy selection is used to ignore the poor generated population and keeps it from diverging from the existing promising regions. To evaluate the performance of the proposed ST-AL algorithm, it has been employed as a global optimization method to discover the optimal value of ten CEC2020 benchmark functions. Also, it has been applied as a feature selection approach on 16 benchmark datasets in the UCI repository and compared with seven well-known optimization feature selection methods. The experimental results reveal the superiority of the proposed algorithm in avoiding local minima and increasing the convergence rate. The experimental result are compared with state-of-the-art algorithms, i.e., ALO, STOA, PSO, GWO, HHO, MFO, and MPA and found that the mean accuracy achieved is in range 0.94-1.00.
工程、生物技术、能源等多个应用领域所产生的数据快速增长,已成为高维数据挖掘中的一项关键挑战。大量数据,尤其是那些高维数据,可能包含许多不相关、冗余或有噪声的特征,这可能会对工业数据挖掘过程的准确性和效率产生负面影响。最近,几种元启发式优化算法已被用于改进特征选择技术,以处理高维问题。尽管优化算法有能力找到搜索空间的近似最优特征子集,但它们仍然面临一些全局优化挑战。本文提出了一种改进的乌黑鹬优化(ST)算法,即ST-AL方法,以提高高维工业优化问题的搜索性能。ST-AL方法是通过应用四种策略提高乌黑鹬优化算法(STOA)的性能而开发的。第一种策略是使用控制随机化参数,以确保搜索过程中探索-利用阶段之间的平衡;此外,它还能避免陷入局部最优。第二种策略是基于蚁狮(AL)算法创建一个新的探索阶段。第三种策略是通过修改位置更新的主要方程来改进STOA的利用阶段。最后,使用贪婪选择来忽略生成的较差种群,并防止其偏离现有的有希望的区域。为了评估所提出的ST-AL算法的性能,它已被用作一种全局优化方法来发现十个CEC2020基准函数的最优值。此外,它还被应用为UCI存储库中16个基准数据集上进行特征选择的方法,并与七种著名的优化特征选择方法进行了比较。实验结果揭示了所提出算法在避免局部最小值和提高收敛速度方面的优越性。实验结果与最新算法,即ALO、STOA、PSO、GWO、HHO、MFO和MPA进行了比较,发现平均准确率在0.94 - 1.00范围内。