• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多变量依赖关系的多尺度费舍尔独立性检验。

Multi-scale Fisher's independence test for multivariate dependence.

作者信息

Gorsky S, Ma L

机构信息

Department of Mathematics and Statistics, University of Massachusetts Amherst, 710 N. Pleasant Street, Amherst, Massachusetts 01003, U.S.A.

Department of Statistical Science, Duke University, Box 90251, Durham, North Carolina 27708, U.S.A.

出版信息

Biometrika. 2022 Sep;109(3):569-587. doi: 10.1093/biomet/asac013. Epub 2022 Feb 21.

DOI:10.1093/biomet/asac013
PMID:36381997
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9648765/
Abstract

Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample size, making it difficult to apply them in the presence of massive sample sizes. Moreover, resampling is usually necessary to evaluate the statistical significance of the resulting test statistics at finite sample sizes, further worsening the computational burden. We introduce a scalable, resampling-free approach to testing the independence between two random vectors by breaking down the task into simple univariate tests of independence on a collection of 2 × 2 contingency tables constructed through sequential coarse-to-fine discretization of the sample space, transforming the inference task into a multiple testing problem that can be completed with almost linear complexity with respect to the sample size. To address increasing dimensionality, we introduce a coarse-to-fine sequential adaptive procedure that exploits the spatial features of dependency structures. We derive a finite-sample theory that guarantees the inferential validity of our adaptive procedure at any given sample size. We show that our approach can achieve strong control of the level of the testing procedure at any sample size without resampling or asymptotic approximation and establish its large-sample consistency. We demonstrate through an extensive simulation study its substantial computational advantage in comparison to existing approaches while achieving robust statistical power under various dependency scenarios, and illustrate how its divide-and-conquer nature can be exploited to not just test independence, but to learn the nature of the underlying dependency. Finally, we demonstrate the use of our method through analysing a dataset from a flow cytometry experiment.

摘要

识别多元数据中的依赖性是众多应用中常见的推理任务。然而,现有的非参数独立性检验通常需要至少与样本量呈二次方增长的计算量,这使得在样本量巨大的情况下难以应用它们。此外,在有限样本量下评估所得检验统计量的统计显著性通常需要重采样,这进一步加重了计算负担。我们引入一种可扩展的、无需重采样的方法来检验两个随机向量之间的独立性,通过将任务分解为对通过样本空间的顺序粗到细离散化构建的一组2×2列联表进行简单的单变量独立性检验,将推理任务转化为一个多重检验问题,该问题相对于样本量几乎可以以线性复杂度完成。为了解决维度增加的问题,我们引入一种粗到细的顺序自适应程序,该程序利用依赖性结构的空间特征。我们推导了一个有限样本理论,该理论保证了我们的自适应程序在任何给定样本量下的推理有效性。我们表明,我们的方法可以在任何样本量下实现对检验程序水平的强控制,而无需重采样或渐近近似,并建立其大样本一致性。我们通过广泛的模拟研究证明了它与现有方法相比的显著计算优势,同时在各种依赖性场景下实现了强大的统计功效,并说明了如何利用其分而治之的性质不仅可以检验独立性,还可以了解潜在依赖性的性质。最后,我们通过分析一个来自流式细胞术实验的数据集展示了我们方法的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/5c58694d5da1/nihms-1821790-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f904299cb7af/nihms-1821790-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f6ef8d103a80/nihms-1821790-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/0d7f8856f7ed/nihms-1821790-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/1b8bffc4a8d4/nihms-1821790-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/023a24821db5/nihms-1821790-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/2c91b9b2d10b/nihms-1821790-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/64fbe2c65c9a/nihms-1821790-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f613146e9445/nihms-1821790-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/cc6432dab908/nihms-1821790-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/5c58694d5da1/nihms-1821790-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f904299cb7af/nihms-1821790-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f6ef8d103a80/nihms-1821790-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/0d7f8856f7ed/nihms-1821790-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/1b8bffc4a8d4/nihms-1821790-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/023a24821db5/nihms-1821790-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/2c91b9b2d10b/nihms-1821790-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/64fbe2c65c9a/nihms-1821790-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/f613146e9445/nihms-1821790-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/cc6432dab908/nihms-1821790-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fbf8/9648765/5c58694d5da1/nihms-1821790-f0011.jpg

相似文献

1
Multi-scale Fisher's independence test for multivariate dependence.用于多变量依赖关系的多尺度费舍尔独立性检验。
Biometrika. 2022 Sep;109(3):569-587. doi: 10.1093/biomet/asac013. Epub 2022 Feb 21.
2
A simulation study for comparing testing statistics in response-adaptive randomization.一种用于比较响应自适应随机化中检验统计量的仿真研究。
BMC Med Res Methodol. 2010 Jun 5;10:48. doi: 10.1186/1471-2288-10-48.
3
ASYMPTOTIC DISTRIBUTIONS OF HIGH-DIMENSIONAL DISTANCE CORRELATION INFERENCE.高维距离相关性推断的渐近分布
Ann Stat. 2021 Aug;49(4):1999-2020. doi: 10.1214/20-aos2024. Epub 2021 Sep 29.
4
Comparison of tests of contingency tables.列联表检验的比较
J Biopharm Stat. 2017;27(5):784-796. doi: 10.1080/10543406.2016.1269786. Epub 2017 Jan 27.
5
A gate-keeping test for selecting adaptive interventions under general designs of sequential multiple assignment randomized trials.一种用于在序贯多项分配随机试验的一般设计下选择适应性干预的门控测试。
Contemp Clin Trials. 2019 Oct;85:105830. doi: 10.1016/j.cct.2019.105830. Epub 2019 Aug 27.
6
Statistical Inferences for Complex Dependence of Multimodal Imaging Data.多模态成像数据复杂依赖性的统计推断
J Am Stat Assoc. 2024;119(546):1486-1499. doi: 10.1080/01621459.2023.2200610. Epub 2023 May 26.
7
A discussion on significance indices for contingency tables under small sample sizes.小样本量条件下列联表的意义指标探讨。
PLoS One. 2018 Aug 2;13(8):e0199102. doi: 10.1371/journal.pone.0199102. eCollection 2018.
8
Testing spatial symmetry using contingency tables based on nearest neighbor relations.使用基于最近邻关系的列联表测试空间对称性。
ScientificWorldJournal. 2014 Jan 19;2014:698296. doi: 10.1155/2014/698296. eCollection 2014.
9
Cauchy combination test: a powerful test with analytic -value calculation under arbitrary dependency structures.柯西组合检验:一种在任意相依结构下具有解析值计算功能的强大检验。
J Am Stat Assoc. 2020;115(529):393-402. doi: 10.1080/01621459.2018.1554485. Epub 2019 Apr 25.
10
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

引用本文的文献

1
RANK-BASED INDICES FOR TESTING INDEPENDENCE BETWEEN TWO HIGH-DIMENSIONAL VECTORS.用于检验两个高维向量之间独立性的基于秩的指标。
Ann Stat. 2024 Feb;52(1):184-206. doi: 10.1214/23-aos2339. Epub 2024 Mar 7.

本文引用的文献

1
On Brownian Distance Covariance and High Dimensional Data.关于布朗距离协方差与高维数据
Ann Appl Stat. 2009 Jan 1;3(4):1266-1269. doi: 10.1214/09-AOAS312.