Borboudakis Giorgos, Tsamardinos Ioannis
University of Crete, Heraklion, Greece.
Gnosis Data Analysis (JADBio), Heraklion, Greece.
Data Min Knowl Discov. 2021;35(4):1393-1434. doi: 10.1007/s10618-020-00731-7. Epub 2021 May 1.
Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.
大多数特征选择方法只能识别单一解决方案。这对于预测目的来说是可以接受的,但如果存在多个解决方案,对于知识发现而言就不够了。我们提出一种策略,将一类贪心方法进行扩展,以有效地识别多个解决方案,并展示在哪些条件下它能识别出所有解决方案。我们还引入了一种考虑到多个解决方案存在的特征分类法。此外,我们探讨了解决方案统计等价性的不同定义以及检验等价性的方法。还介绍了一种用于紧凑表示和可视化多个解决方案的新颖算法。在实验中我们表明:(a)所提出的算法在计算效率上显著高于TIE算法,TIE算法是唯一具有类似理论保证的替代方法,同时能识别出与它类似的解决方案;(b)所识别出的解决方案具有相似的预测性能。