Suppr超能文献

距离相关性的卡方检验。

The Chi-Square Test of Distance Correlation.

作者信息

Shen Cencheng, Panda Sambit, Vogelstein Joshua T

机构信息

Department of Applied Economics and Statistics, University of Delaware.

Institute for Computational Medicine, Department of Biomedical Engineering, Johns Hopkins University.

出版信息

J Comput Graph Stat. 2022;31(1):254-262. doi: 10.1080/10618600.2021.1938585. Epub 2021 Jul 19.

Abstract

Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any type of dependency structure given sufficient sample size. One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data. To overcome the difficulty, in this paper we propose a chi-square test for distance correlation. Method-wise, the chi-square test is non-parametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel. The test exhibits a similar testing power as the standard permutation test, and can be utilized for K-sample and partial testing. Theory-wise, we show that the underlying chi-square distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-square test can be valid and universally consistent for testing independence, and establish a testing power inequality with respect to the permutation test.

摘要

距离相关性最近在数据科学界备受关注

样本统计量易于计算,并且当且仅当变量独立时渐近等于零,这使得在有足够样本量的情况下,它成为发现任何类型依赖结构的理想选择。一个主要瓶颈在于检验过程:由于距离相关性的零分布取决于潜在的随机变量和度量选择,通常需要进行排列检验来估计零分布并计算p值,对于大量数据而言这成本非常高。为了克服这一困难,在本文中我们提出了一种用于距离相关性的卡方检验。从方法上来说,卡方检验是非参数的,速度极快,并且适用于使用任何强负型度量或特征核的偏差校正距离相关性。该检验表现出与标准排列检验相似的检验功效,并且可用于K样本检验和部分检验。从理论上来说,我们表明潜在的卡方分布能很好地近似并在上尾处主导极限零分布,证明卡方检验对于检验独立性可以是有效的且普遍一致的,并建立了相对于排列检验的检验功效不等式。

相似文献

1
The Chi-Square Test of Distance Correlation.距离相关性的卡方检验。
J Comput Graph Stat. 2022;31(1):254-262. doi: 10.1080/10618600.2021.1938585. Epub 2021 Jul 19.
2
The chi-square test of independence.卡方独立性检验。
Biochem Med (Zagreb). 2013;23(2):143-9. doi: 10.11613/bm.2013.018.
5
7
Testing multiple variance components in linear mixed-effects models.在线性混合效应模型中检验多个方差分量。
Biostatistics. 2013 Jan;14(1):144-59. doi: 10.1093/biostatistics/kxs028. Epub 2012 Aug 28.

引用本文的文献

1
Multiscale comparative connectomics.多尺度比较连接组学
Imaging Neurosci (Camb). 2025 May 16;3. doi: 10.1162/IMAG.a.2. eCollection 2025.
3
Bias-corrected-based collaborative filtering recommendation (Bias-Corr-CF).基于偏差校正的协同过滤推荐(Bias-Corr-CF)。
PLoS One. 2025 Jun 30;20(6):e0324173. doi: 10.1371/journal.pone.0324173. eCollection 2025.
7
Universally Consistent K-Sample Tests via Dependence Measures.通过依赖度量实现的通用一致K样本检验
Stat Probab Lett. 2025 Jan;216. doi: 10.1016/j.spl.2024.110278. Epub 2024 Sep 19.

本文引用的文献

2
CONDITIONAL DISTANCE CORRELATION.条件距离相关性
J Am Stat Assoc. 2015;110(512):1726-1734. doi: 10.1080/01621459.2014.993081. Epub 2015 Jan 23.
3
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
4
Advances in biomarker research for pancreatic cancer.胰腺癌生物标志物研究进展。
Curr Pharm Des. 2012;18(17):2439-51. doi: 10.2174/13816128112092439.
5
Mutant proteins as cancer-specific biomarkers.突变蛋白作为癌症特异性生物标志物。
Proc Natl Acad Sci U S A. 2011 Feb 8;108(6):2444-9. doi: 10.1073/pnas.1019203108. Epub 2011 Jan 19.
6

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验