• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过G检验计算复用加速因果推断和特征选择方法

Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse.

作者信息

Băncioiu Camil, Brad Remus

机构信息

Department of Computer Science and Electrical Engineering, Lucian Blaga University of Sibiu, 550024 Sibiu, Romania.

出版信息

Entropy (Basel). 2021 Nov 12;23(11):1501. doi: 10.3390/e23111501.

DOI:10.3390/e23111501
PMID:34828198
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8619989/
Abstract

This article presents a novel and remarkably efficient method of computing the statistical G-test made possible by exploiting a connection with the fundamental elements of information theory: by writing the statistic as a sum of joint entropy terms, its computation is decomposed into easily reusable partial results with no change in the resulting value. This method greatly improves the efficiency of applications that perform a series of G-tests on permutations of the same features, such as feature selection and causal inference applications because this decomposition allows for an intensive reuse of these partial results. The efficiency of this method is demonstrated by implementing it as part of an experiment involving IPC-MB, an efficient Markov blanket discovery algorithm, applicable both as a feature selection algorithm and as a causal inference method. The results show outstanding efficiency gains for IPC-MB when the G-test is computed with the proposed method, compared to the unoptimized G-test, but also when compared to IPC-MB++, a variant of IPC-MB which is enhanced with an AD-tree, both static and dynamic. Even if this proposed method of computing the G-test is presented here in the context of IPC-MB, it is in fact bound neither to IPC-MB in particular, nor to feature selection or causal inference applications in general, because this method targets the information-theoretic concept that underlies the G-test, namely conditional mutual information. This aspect grants it wide applicability in data sciences.

摘要

本文提出了一种新颖且极为高效的计算统计G检验的方法,该方法通过利用与信息论基本元素的联系得以实现:将统计量写成联合熵项的和,其计算被分解为易于重复使用的部分结果,而结果值不变。这种方法极大地提高了对相同特征排列进行一系列G检验的应用的效率,例如特征选择和因果推断应用,因为这种分解允许对这些部分结果进行大量重复使用。通过将其作为涉及IPC-MB(一种高效的马尔可夫毯发现算法,既可用作特征选择算法,也可用作因果推断方法)的实验的一部分来实现,证明了该方法的效率。结果表明,与未优化的G检验相比,当使用所提出的方法计算G检验时,IPC-MB具有显著的效率提升,而且与IPC-MB++(IPC-MB的一个变体,通过静态和动态AD树进行增强)相比也是如此。即使这里提出的计算G检验的方法是在IPC-MB的背景下呈现的,但实际上它既不特别局限于IPC-MB,也不一般地局限于特征选择或因果推断应用,因为该方法针对的是G检验背后的信息论概念,即条件互信息。这一特性使其在数据科学中具有广泛的适用性。

相似文献

1
Accelerating Causal Inference and Feature Selection Methods through G-Test Computation Reuse.通过G检验计算复用加速因果推断和特征选择方法
Entropy (Basel). 2021 Nov 12;23(11):1501. doi: 10.3390/e23111501.
2
Accurate Markov Boundary Discovery for Causal Feature Selection.准确的马尔可夫边界发现因果特征选择。
IEEE Trans Cybern. 2020 Dec;50(12):4983-4996. doi: 10.1109/TCYB.2019.2940509. Epub 2020 Dec 3.
3
Online Causal Feature Selection for Streaming Features.在线因果特征选择的流媒体功能。
IEEE Trans Neural Netw Learn Syst. 2023 Mar;34(3):1563-1577. doi: 10.1109/TNNLS.2021.3105585. Epub 2023 Feb 28.
4
Efficient Markov Blanket Discovery and Its Application.高效马尔可夫毯发现及其应用。
IEEE Trans Cybern. 2017 May;47(5):1169-1179. doi: 10.1109/TCYB.2016.2539338. Epub 2016 Mar 24.
5
Causal Feature Selection With Dual Correction.具有双重校正的因果特征选择
IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3178075.
6
Hybrid Causal Feature Selection for Cancer Biomarker Identification From RNA-Seq Data.用于从RNA测序数据中识别癌症生物标志物的混合因果特征选择
IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1645-1655. doi: 10.1109/TCBB.2024.3406922. Epub 2024 Dec 10.
7
Interleaved incremental association Markov blanket as a potential feature selection method for improving accuracy in near-infrared spectroscopic analysis.交错增量关联马尔可夫毯作为一种潜在的特征选择方法,用于提高近红外光谱分析的准确性。
Talanta. 2018 Feb 1;178:348-354. doi: 10.1016/j.talanta.2017.09.039. Epub 2017 Sep 21.
8
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
9
The MVGC multivariate Granger causality toolbox: a new approach to Granger-causal inference.MVGC 多元 Granger 因果关系工具箱:Granger 因果推断的新方法。
J Neurosci Methods. 2014 Feb 15;223:50-68. doi: 10.1016/j.jneumeth.2013.10.018. Epub 2013 Nov 5.
10
Markov Blanket Feature Selection Using Representative Sets.基于代表性集合的马尔可夫毯特征选择。
IEEE Trans Neural Netw Learn Syst. 2017 Nov;28(11):2775-2788. doi: 10.1109/TNNLS.2016.2602365.

本文引用的文献

1
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
2
SciPy 1.0: fundamental algorithms for scientific computing in Python.SciPy 1.0:Python 中的科学计算基础算法。
Nat Methods. 2020 Mar;17(3):261-272. doi: 10.1038/s41592-019-0686-2. Epub 2020 Feb 3.
3
A greedy feature selection algorithm for Big Data of high dimensionality.一种用于高维大数据的贪心特征选择算法。
Mach Learn. 2019;108(2):149-202. doi: 10.1007/s10994-018-5748-7. Epub 2018 Aug 7.
4
HITON: a novel Markov Blanket algorithm for optimal variable selection.希顿:一种用于最优变量选择的新型马尔可夫毯算法。
AMIA Annu Symp Proc. 2003;2003:21-5.