Yang Xingli, Wang Yu, Wang Ruibo, Li Jihong
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):6628-6641. doi: 10.1109/TNNLS.2021.3128173. Epub 2023 Sep 1.
Ensemble feature selection (EFS) has attracted significant interest in the literature due to its great potential in reducing the discovery rate of noise features and stabilizing the feature selection results. In view of the superior performance of block-regularized m × 2 cross-validation on generalization performance and algorithm comparison, a novel EFS technology based on block-regularized m × 2 cross-validation is proposed in this study. Contrary to the traditional ensemble learning with a binomial distribution, the distribution of feature selection frequency in the proposed technique is approximated by a beta distribution more accurately. Furthermore, theoretical analysis of the proposed technique shows that it yields a higher selection probability for important features, lower selected risk for noise features, more true positives, and fewer false positives. Finally, the above conclusions are verified by the simulated and real data experiments.
集成特征选择(EFS)因其在降低噪声特征发现率和稳定特征选择结果方面的巨大潜力而在文献中引起了广泛关注。鉴于块正则化m×2交叉验证在泛化性能和算法比较方面的优越性能,本研究提出了一种基于块正则化m×2交叉验证的新型EFS技术。与具有二项分布的传统集成学习相反,所提出技术中特征选择频率的分布由贝塔分布更准确地近似。此外,对所提出技术的理论分析表明,它对重要特征产生更高的选择概率,对噪声特征产生更低的选择风险,更多的真阳性和更少的假阳性。最后,通过模拟和真实数据实验验证了上述结论。