Department of Computer Engineering, Faculty of Engineering, Arak University, Arak, 38156-8-8349, Iran.
Comput Biol Med. 2022 Sep;148:105820. doi: 10.1016/j.compbiomed.2022.105820. Epub 2022 Jul 14.
Feature selection is exposed to the curse of dimensionality risk, and it is even more exacerbated with high-dimensional data such as microarrays. Moreover, the low-instance/high-feature (LIHF) property of microarray data needs considerable processing time to do some calculations and comparisons among features to choose the best subset of them, which has led to many efforts to subdue the LIHF property of such genomic medicine data. Due to the promising results of the ensemble models in machine learning problems, this paper presents a novel framework, named feature-level aggregation-based ensemble based on overlapped feature subspace partitioning (FLAE-OFSP) for microarray data classification. The proposed ensemble has three main steps: after generating several subsets by the proposed partitioning approach, a feature selection algorithm (i.e., a feature ranker) is applied on each subset, and finally, their results are combined into a single ranked list using six defined aggregation functions. Evaluation of the presented framework based on seven microarray datasets and using four measures, including stability, classification accuracy, runtime, and Modscore shows substantial runtime improvement and also quality results in other evaluated measures compared to individual methods.
特征选择容易受到维度风险的影响,在处理高维数据(如微阵列)时,这种影响更为严重。此外,微阵列数据的低实例/高特征(LIHF)特性需要相当多的处理时间来对特征进行一些计算和比较,以选择最佳的特征子集,这导致了许多努力来抑制此类基因组医学数据的 LIHF 特性。由于集成模型在机器学习问题中取得了有希望的结果,本文提出了一种新的框架,名为基于重叠特征子空间划分的基于特征级聚合的集成(FLAE-OFSP),用于微阵列数据分类。该集成有三个主要步骤:通过提出的划分方法生成几个子集后,将特征选择算法(即特征排名器)应用于每个子集,最后,使用六个定义的聚合函数将它们的结果组合成一个单一的排名列表。基于七个微阵列数据集和四个度量标准(包括稳定性、分类准确性、运行时间和 Modscore)对所提出的框架进行评估,与单个方法相比,该框架在运行时间方面有了实质性的改进,在其他评估度量标准方面也取得了高质量的结果。