IEEE Trans Cybern. 2017 Dec;47(12):4356-4366. doi: 10.1109/TCYB.2016.2609408. Epub 2016 Sep 28.
A novel similarity-based feature selection algorithm is developed, using the concept of distance correlation. A feature subset is selected in terms of this similarity measure between pairs of features, without assuming any underlying distribution of the data. The pair-wise similarity is then employed, in a message passing framework, to select a set of exemplars features involving minimum redundancy and reduced parameter tuning. The algorithm does not need an exhaustive traversal of the search space. The methodology is next extended to handle large data, using an inherent property of distance correlation. The effectiveness of the algorithm is demonstrated on nine sets of publicly-available data.
开发了一种新的基于相似性的特征选择算法,利用距离相关的概念。根据特征对之间的这种相似性度量选择特征子集,而不假设数据的任何潜在分布。然后,在消息传递框架中使用这种两两相似性来选择一组包含最小冗余和减少参数调整的示例特征。该算法不需要对搜索空间进行详尽的遍历。接下来,利用距离相关的固有性质,将该方法扩展到处理大数据。在九组公开可用的数据上验证了该算法的有效性。