Department of Computer and Information Science, University of Macau, Macau SAR, China.
School of Computer Science and Engineering, University of New South Wales, New South Wales, Australia.
Sci Rep. 2017 Jun 28;7(1):4354. doi: 10.1038/s41598-017-04037-5.
Due to the high-dimensional characteristics of dataset, we propose a new method based on the Wolf Search Algorithm (WSA) for optimising the feature selection problem. The proposed approach uses the natural strategy established by Charles Darwin; that is, 'It is not the strongest of the species that survives, but the most adaptable'. This means that in the evolution of a swarm, the elitists are motivated to quickly obtain more and better resources. The memory function helps the proposed method to avoid repeat searches for the worst position in order to enhance the effectiveness of the search, while the binary strategy simplifies the feature selection problem into a similar problem of function optimisation. Furthermore, the wrapper strategy gathers these strengthened wolves with the classifier of extreme learning machine to find a sub-dataset with a reasonable number of features that offers the maximum correctness of global classification models. The experimental results from the six public high-dimensional bioinformatics datasets tested demonstrate that the proposed method can best some of the conventional feature selection methods up to 29% in classification accuracy, and outperform previous WSAs by up to 99.81% in computational time.
由于数据集的高维特征,我们提出了一种基于 Wolf Search Algorithm(WSA)的新方法,用于优化特征选择问题。所提出的方法利用了查尔斯·达尔文(Charles Darwin)建立的自然策略;也就是说,“不是最强的物种生存,而是最适应的物种生存”。这意味着在群体的进化中,精英们被激励着快速获得更多更好的资源。记忆功能有助于所提出的方法避免对最差位置的重复搜索,以提高搜索的有效性,而二进制策略则将特征选择问题简化为类似的函数优化问题。此外,包装器策略将这些强化的狼与极端学习机的分类器结合起来,以找到一个具有合理特征数量的子数据集,从而为全局分类模型提供最大的正确性。从六个公共的高维生物信息学数据集进行的实验结果表明,所提出的方法在分类准确性方面可以胜过一些传统的特征选择方法高达 29%,并且在计算时间方面可以胜过以前的 WSA 高达 99.81%。