Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
Genes (Basel). 2021 Nov 18;12(11):1814. doi: 10.3390/genes12111814.
Biological omics data such as transcriptomes and methylomes have the inherent "large p small n" paradigm, i.e., the number of features is much larger than that of the samples. A feature selection (FS) algorithm selects a subset of the transcriptomic or methylomic biomarkers in order to build a better prediction model. The hidden patterns in the FS solution space make it challenging to achieve a feature subset with satisfying prediction performances. Swarm intelligence (SI) algorithms mimic the target searching behaviors of various animals and have demonstrated promising capabilities in selecting features with good machine learning performances. Our study revealed that different SI-based feature selection algorithms contributed complementary searching capabilities in the FS solution space, and their collaboration generated a better feature subset than the individual SI feature selection algorithms. Nine SI-based feature selection algorithms were integrated to vote for the selected features, which were further refined by the dynamic recursive feature elimination framework. In most cases, the proposed Zoo algorithm outperformed the existing feature selection algorithms on transcriptomics and methylomics datasets.
生物组学数据(如转录组和甲基组)具有固有的“大 p 小 n”范式,即特征数量远远大于样本数量。特征选择(FS)算法选择转录组或甲基组生物标志物的子集,以构建更好的预测模型。FS 解决方案空间中的隐藏模式使得很难获得具有令人满意的预测性能的特征子集。群体智能(SI)算法模拟了各种动物的目标搜索行为,在选择具有良好机器学习性能的特征方面表现出了有前景的能力。我们的研究表明,基于不同 SI 的特征选择算法在 FS 解决方案空间中贡献了互补的搜索能力,它们的协作生成了比单个 SI 特征选择算法更好的特征子集。九个基于 SI 的特征选择算法被整合起来为选定的特征投票,这些特征进一步通过动态递归特征消除框架进行细化。在大多数情况下,与现有的特征选择算法相比,所提出的 Zoo 算法在转录组和甲基组数据集上表现更好。