基于增量马尔可夫边界学习的数据流特征选择

Feature Selection in the Data Stream Based on Incremental Markov Boundary Learning.

作者信息

Wu Xingyu, Jiang Bingbing, Wang Xiangyu, Ban Taiyu, Chen Huanhuan

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):6740-6754. doi: 10.1109/TNNLS.2023.3249767. Epub 2023 Oct 5.

DOI:10.1109/TNNLS.2023.3249767

Abstract

Recent years have witnessed the proliferation of techniques for streaming data mining to meet the demands of many real-time systems, where high-dimensional streaming data are generated at high speed, increasing the burden on both hardware and software. Some feature selection algorithms for streaming data are proposed to tackle this issue. However, these algorithms do not consider the distribution shift due to nonstationary scenarios, leading to performance degradation when the underlying distribution changes in the data stream. To solve this problem, this article investigates feature selection in streaming data through incremental Markov boundary (MB) learning and proposes a novel algorithm. Different from existing algorithms focusing on prediction performance on off-line data, the MB is learned by analyzing conditional dependence/independence in data, which uncovers the underlying mechanism and is naturally more robust against the distribution shift. To learn MB in the data stream, the proposal transforms the learned information in previous data blocks to prior knowledge and employs them to assist MB discovery in current data blocks, where the likelihood of distribution shift and reliability of conditional independence test are monitored to avoid the negative impact from invalid prior information. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of the proposed algorithm.

摘要

近年来，为满足许多实时系统的需求，用于流数据挖掘的技术大量涌现，在这些系统中，高维流数据高速生成，增加了硬件和软件的负担。为此提出了一些用于流数据的特征选择算法。然而，这些算法没有考虑非平稳场景导致的分布变化，当数据流中的潜在分布发生变化时会导致性能下降。为解决这个问题，本文通过增量马尔可夫边界（MB）学习研究流数据中的特征选择，并提出了一种新算法。与现有专注于离线数据预测性能的算法不同，MB是通过分析数据中的条件依赖/独立性来学习的，它揭示了潜在机制，并且自然地对分布变化更具鲁棒性。为在数据流中学习MB，该提议将先前数据块中学习到的信息转换为先验知识，并利用它们协助当前数据块中的MB发现，同时监测分布变化的可能性和条件独立性测试的可靠性，以避免无效先验信息的负面影响。在合成数据集和真实世界数据集上进行的大量实验证明了所提算法的优越性。

相似文献

Feature Selection in the Data Stream Based on Incremental Markov Boundary Learning.基于增量马尔可夫边界学习的数据流特征选择

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):6740-6754. doi: 10.1109/TNNLS.2023.3249767. Epub 2023 Oct 5.

Accurate Markov Boundary Discovery for Causal Feature Selection.准确的马尔可夫边界发现因果特征选择。

IEEE Trans Cybern. 2020 Dec;50(12):4983-4996. doi: 10.1109/TCYB.2019.2940509. Epub 2020 Dec 3.

Causal Feature Selection With Dual Correction.具有双重校正的因果特征选择

IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3178075.

Online Causal Feature Selection for Streaming Features.在线因果特征选择的流媒体功能。

IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1563-1577. doi: 10.1109/TNNLS.2021.3105585. Epub 2023 Feb 28.

Continuous Support Vector Regression for Nonstationary Streaming Data.非平稳流数据的连续支持向量回归。

IEEE Trans Cybern. 2022 May;52(5):3592-3605. doi: 10.1109/TCYB.2020.3015266. Epub 2022 May 19.

Online feature selection with streaming features.在线流特征的特征选择。

IEEE Trans Pattern Anal Mach Intell. 2013 May;35(5):1178-92. doi: 10.1109/TPAMI.2012.197.

Learning High-Dimensional Evolving Data Streams With Limited Labels.学习具有有限标签的高维演化数据流。

IEEE Trans Cybern. 2022 Nov;52(11):11373-11384. doi: 10.1109/TCYB.2021.3070420. Epub 2022 Oct 17.

Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster.基于边界微簇快速剥离的高效在线流聚类

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5680-5693. doi: 10.1109/TNNLS.2024.3382033. Epub 2025 Feb 28.

COMPOSE: A semisupervised learning framework for initially labeled nonstationary streaming data.COMPOSE：一种用于初始标记非平稳流数据的半监督学习框架。

IEEE Trans Neural Netw Learn Syst. 2014 Jan;25(1):12-26. doi: 10.1109/TNNLS.2013.2277712.

Visual Structural Assessment and Anomaly Detection for High-Velocity Data Streams.高速数据流的可视化结构评估和异常检测。

IEEE Trans Cybern. 2021 Dec;51(12):5979-5992. doi: 10.1109/TCYB.2020.2973137. Epub 2021 Dec 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于增量马尔可夫边界学习的数据流特征选择

Feature Selection in the Data Stream Based on Incremental Markov Boundary Learning.

作者信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献