Suppr超能文献

具有双重校正的因果特征选择

Causal Feature Selection With Dual Correction.

作者信息

Guo Xianjie, Yu Kui, Liu Lin, Cao Fuyuan, Li Jiuyong

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3178075.

Abstract

Causal feature selection methods aim to identify a Markov boundary (MB) of a class variable, and almost all the existing causal feature selection algorithms use conditional independence (CI) tests to learn the MB. However, in real-world applications, due to data issues (e.g., noisy or small samples), CI tests can be unreliable; thus, causal feature selection algorithms relying on CI tests encounter two types of errors: false positives (i.e., selecting false MB features) and false negatives (i.e., discarding true MB features). Existing algorithms only tackle either false positives or false negatives, and they cannot deal with both types of errors at the same time, leading to unsatisfactory results. To address this issue, we propose a dual-correction-strategy-based MB learning (DCMB) algorithm to correct the two types of errors simultaneously. Specifically, DCMB selectively removes false positives from the MB features currently selected, while selectively retrieving false negatives from the features currently discarded. To automatically determine the optimal number of selected features for the selective removal and retrieval in the dual correction strategy, we design the simulated-annealing-based DCMB (SA-DCMB) algorithm. Using benchmark Bayesian network (BN) datasets, the experimental results demonstrate that DCMB achieves substantial improvements on the MB learning accuracy compared with the existing MB learning methods. Empirical studies in real-world datasets validate the effectiveness of SA-DCMB for classification against state-of-the-art causal and traditional feature selection algorithms.

摘要

因果特征选择方法旨在识别类变量的马尔可夫边界(MB),并且几乎所有现有的因果特征选择算法都使用条件独立性(CI)测试来学习MB。然而,在实际应用中,由于数据问题(例如,噪声或小样本),CI测试可能不可靠;因此,依赖CI测试的因果特征选择算法会遇到两种类型的错误:误报(即选择错误的MB特征)和漏报(即丢弃真正的MB特征)。现有算法只处理误报或漏报中的一种,无法同时处理这两种类型的错误,导致结果不尽人意。为了解决这个问题,我们提出了一种基于双校正策略的MB学习(DCMB)算法,以同时校正这两种类型的错误。具体来说,DCMB从当前选择的MB特征中选择性地去除误报,同时从当前丢弃的特征中选择性地找回漏报。为了在双校正策略中自动确定用于选择性去除和找回的最佳选择特征数量,我们设计了基于模拟退火的DCMB(SA-DCMB)算法。使用基准贝叶斯网络(BN)数据集,实验结果表明,与现有的MB学习方法相比,DCMB在MB学习准确性方面取得了显著提高。在真实世界数据集上的实证研究验证了SA-DCMB相对于最先进的因果和传统特征选择算法在分类方面的有效性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验