Qiu Feng, Zheng Pan, Heidari Ali Asghar, Liang Guoxi, Chen Huiling, Karim Faten Khalid, Elmannai Hela, Lin Haiping
Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
Information Systems, University of Canterbury, Christchurch 8014, New Zealand.
Biomedicines. 2022 Aug 22;10(8):2052. doi: 10.3390/biomedicines10082052.
A large volume of high-dimensional genetic data has been produced in modern medicine and biology fields. Data-driven decision-making is particularly crucial to clinical practice and relevant procedures. However, high-dimensional data in these fields increase the processing complexity and scale. Identifying representative genes and reducing the data's dimensions is often challenging. The purpose of gene selection is to eliminate irrelevant or redundant features to reduce the computational cost and improve classification accuracy. The wrapper gene selection model is based on a feature set, which can reduce the number of features and improve classification accuracy. This paper proposes a wrapper gene selection method based on the slime mould algorithm (SMA) to solve this problem. SMA is a new algorithm with a lot of application space in the feature selection field. This paper improves the original SMA by combining the Cauchy mutation mechanism with the crossover mutation strategy based on differential evolution (DE). Then, the transfer function converts the continuous optimizer into a binary version to solve the gene selection problem. Firstly, the continuous version of the method, ISMA, is tested on 33 classical continuous optimization problems. Then, the effect of the discrete version, or BISMA, was thoroughly studied by comparing it with other gene selection methods on 14 gene expression datasets. Experimental results show that the continuous version of the algorithm achieves an optimal balance between local exploitation and global search capabilities, and the discrete version of the algorithm has the highest accuracy when selecting the least number of genes.
在现代医学和生物学领域已经产生了大量的高维遗传数据。数据驱动的决策对于临床实践和相关程序尤为关键。然而,这些领域中的高维数据增加了处理的复杂性和规模。识别代表性基因并降低数据维度通常具有挑战性。基因选择的目的是消除不相关或冗余的特征,以降低计算成本并提高分类准确性。包装器基因选择模型基于特征集,它可以减少特征数量并提高分类准确性。本文提出了一种基于黏菌算法(SMA)的包装器基因选择方法来解决这个问题。SMA是一种在特征选择领域有很大应用空间的新算法。本文通过将柯西变异机制与基于差分进化(DE)的交叉变异策略相结合来改进原始的SMA。然后,传递函数将连续优化器转换为二进制版本以解决基因选择问题。首先,在33个经典连续优化问题上测试该方法的连续版本ISMA。然后,通过在14个基因表达数据集上与其他基因选择方法进行比较,深入研究离散版本(即BISMA)的效果。实验结果表明,该算法的连续版本在局部开发和全局搜索能力之间实现了最佳平衡,并且该算法的离散版本在选择最少数量基因时具有最高的准确性。