IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4544-4554. doi: 10.1109/TPAMI.2021.3071138. Epub 2022 Aug 4.
The L-regularized logistic regression (L1-LR) is popular for classification problems. To accelerate its training speed for high-dimensional data, techniques named safe screening rules have been proposed recently. They can safely delete the inactive features in data so as to greatly reduce the training cost of L1-LR. The screening power of these rules is determined by their corresponding safe regions, which is also the core technique of safe screening rules. In this paper, we introduce a new safe feature elimination rule (SFER) for L1-LR. Compared to existing safe rules, the safe region of SFER is improved in two aspects: (1) a smaller sphere region is constructed by using the strong convexity of dual L1-LR twice; (2) multiple half-spaces, which correspond to the potential active constraints, are added for further contraction. Both improvements can enhance the screening ability of SFER. As for the complexity of SFER, an iterative filtering framework is given by decomposing the safe region into multiple "domes". In this way, SFER admits a closed form solution and the identified features will not be scanned repeatedly. Experiments on ten benchmark data sets demonstrate that SFER gives superior performance than existing methods on training efficiency.
L 正则化逻辑回归(L1-LR)在分类问题中很受欢迎。为了加速高维数据的训练速度,最近提出了名为安全筛选规则的技术。它们可以安全地删除数据中的非活动特征,从而大大降低 L1-LR 的训练成本。这些规则的筛选能力由它们相应的安全区域决定,这也是安全筛选规则的核心技术。在本文中,我们为 L1-LR 引入了一种新的安全特征消除规则(SFER)。与现有的安全规则相比,SFER 的安全区域在两个方面得到了改进:(1)使用对偶 L1-LR 的两次强凸性构建了一个较小的球体区域;(2)添加了多个半空间,以进一步收缩,这些半空间对应于潜在的有效约束。这两个改进都可以增强 SFER 的筛选能力。对于 SFER 的复杂性,通过将安全区域分解为多个“穹顶”,给出了一种迭代过滤框架。通过这种方式,SFER 可以得到闭式解,并且识别出的特征不会被重复扫描。在十个基准数据集上的实验表明,SFER 在训练效率方面优于现有的方法。