• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种简单灵活的样本可交换性检验及其在统计基因组学中的应用

A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.

作者信息

Aw Alan J, Spence Jeffrey P, Song Yun S

机构信息

Department of Statistics, University of California, Berkeley.

Department of Genetics, School of Medicine, Stanford University.

出版信息

Ann Appl Stat. 2024 Mar;18(1):858-881. doi: 10.1214/23-aoas1817. Epub 2024 Jan 31.

DOI:10.1214/23-aoas1817
PMID:38784669
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11115382/
Abstract

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

摘要

在涉及多变量数据分析的科学研究中,研究人员常常会遇到一些基本但重要的问题:样本是否可交换,即样本的联合分布对于单元的排序是否不变?特征之间是否相互独立,或者是否可以将特征分组,使得这些组相互独立?在统计基因组学中,这些考量对于诸如人口推断和多基因风险评分构建等下游任务至关重要。我们提出一种非参数方法,我们称之为V检验,以解决这两个问题,即给定特征依赖结构时的样本可交换性检验,以及给定样本可交换性时的特征独立性检验。我们的检验在概念上很简单,但快速且灵活。它在实际场景中控制第一类错误,并通过利用大样本渐近性来处理任意维度的数据。通过广泛的模拟以及与基于随机矩阵理论的无监督分层检验进行比较,我们发现在各种感兴趣的场景中,我们的检验表现良好。我们将该检验应用于千人基因组计划的数据,展示了它如何用于评估遗传样本的可交换性,或为下游分析找到最优的连锁不平衡(LD)划分。对于可交换性评估,我们发现去除罕见变异可以显著提高检验统计量的p值。对于最优LD划分,V检验报告的最优划分与以往不依赖假设检验的方法不同。我们方法的软件可在R(CRAN:flintyR)和Python(PyPI:flintyPy)中获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/9c069e6af251/nihms-1937735-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/157ac6dcccbe/nihms-1937735-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/0e8a7bff15eb/nihms-1937735-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/754f045a096f/nihms-1937735-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/6ab0dbd13efe/nihms-1937735-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/a66bd739a797/nihms-1937735-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/594bf73d78ff/nihms-1937735-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/9c069e6af251/nihms-1937735-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/157ac6dcccbe/nihms-1937735-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/0e8a7bff15eb/nihms-1937735-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/754f045a096f/nihms-1937735-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/6ab0dbd13efe/nihms-1937735-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/a66bd739a797/nihms-1937735-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/594bf73d78ff/nihms-1937735-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec7b/11115382/9c069e6af251/nihms-1937735-f0007.jpg

相似文献

1
A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.一种简单灵活的样本可交换性检验及其在统计基因组学中的应用
Ann Appl Stat. 2024 Mar;18(1):858-881. doi: 10.1214/23-aoas1817. Epub 2024 Jan 31.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Testing exchangeability of multivariate distributions.检验多元分布的可交换性。
J Appl Stat. 2022 Jul 26;50(15):3142-3156. doi: 10.1080/02664763.2022.2102158. eCollection 2023.
4
mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes.mixIndependR:一个用于在多基因座基因型数据库中测试基因座统计独立性的 R 包。
BMC Bioinformatics. 2021 Jan 6;22(1):12. doi: 10.1186/s12859-020-03945-0.
5
Nonparametric methods for microarray data based on exchangeability and borrowed power.基于可交换性和借势的微阵列数据非参数方法。
J Biopharm Stat. 2005;15(5):783-97. doi: 10.1081/BIP-200067778.
6
Multi-scale Fisher's independence test for multivariate dependence.用于多变量依赖关系的多尺度费舍尔独立性检验。
Biometrika. 2022 Sep;109(3):569-587. doi: 10.1093/biomet/asac013. Epub 2022 Feb 21.
7
Exchangeability in multivariate Markov chain models.多元马尔可夫链模型中的可交换性。
Biometrics. 1992 Sep;48(3):751-63.
8
Robust exchangeability designs for early phase clinical trials with multiple strata.用于多分层早期临床试验的稳健可交换性设计。
Pharm Stat. 2016 Mar-Apr;15(2):123-34. doi: 10.1002/pst.1730. Epub 2015 Dec 18.
9
Exchangeability of Measures of Association Before and After Exposure Status Is Flipped: Its Relationship With Confounding in the Counterfactual Model.暴露状态反转前后关联度量的可交换性:与反事实模型中混杂的关系。
J Epidemiol. 2023 Aug 5;33(8):385-389. doi: 10.2188/jea.JE20210352. Epub 2022 Jul 13.
10
POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data.POLARIS:用于全基因组关联研究(GWAS)数据基于集合分析的多基因连锁不平衡调整风险评分方法。
Genet Epidemiol. 2018 Jun;42(4):366-377. doi: 10.1002/gepi.22117. Epub 2018 Mar 12.

引用本文的文献

1
Hidden structure in polygenic scores and the challenge of disentangling ancestry interactions in admixed populations.多基因评分中的隐藏结构以及混合人群中解析血统相互作用的挑战。
bioRxiv. 2025 Jul 4:2025.06.30.662316. doi: 10.1101/2025.06.30.662316.

本文引用的文献

1
Testing exchangeability of multivariate distributions.检验多元分布的可交换性。
J Appl Stat. 2022 Jul 26;50(15):3142-3156. doi: 10.1080/02664763.2022.2102158. eCollection 2023.
2
Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores.识别并校正全基因组关联研究汇总统计数据和多基因评分中的错误设定。
HGG Adv. 2022 Aug 18;3(4):100136. doi: 10.1016/j.xhgg.2022.100136. eCollection 2022 Oct 13.
3
Optimal linkage disequilibrium splitting.最优连锁不平衡拆分。
Bioinformatics. 2021 Dec 22;38(1):255-256. doi: 10.1093/bioinformatics/btab519.
4
Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations.在 26 个人类群体中推断和分析特定人群的精细尺度重组图谱。
Sci Adv. 2019 Oct 23;5(10):eaaw9206. doi: 10.1126/sciadv.aaw9206. eCollection 2019 Oct.
5
Exact testing with random permutations.使用随机排列的精确检验。
Test (Madr). 2018;27(4):811-825. doi: 10.1007/s11749-017-0571-1. Epub 2017 Nov 30.
6
Probabilistic fine-mapping of transcriptome-wide association studies.全转录组关联研究的概率精细映射。
Nat Genet. 2019 Apr;51(4):675-682. doi: 10.1038/s41588-019-0367-1. Epub 2019 Mar 29.
7
Distribution-free tests of independence in high dimensions.高维情形下的无分布独立性检验。
Biometrika. 2017 Dec;104(4):813-828. doi: 10.1093/biomet/asx050. Epub 2017 Oct 3.
8
A new haplotype block detection method for dense genome sequencing data based on interval graph modeling of clusters of highly correlated SNPs.基于高度相关 SNPs 簇的区间图建模的密集基因组测序数据新型单倍型块检测方法。
Bioinformatics. 2018 Feb 1;34(3):388-397. doi: 10.1093/bioinformatics/btx609.
9
Eigenvalue significance testing for genetic association.基因关联的特征值显著性检验
Biometrics. 2018 Jun;74(2):439-447. doi: 10.1111/biom.12767. Epub 2017 Aug 29.
10
Polygenic scores via penalized regression on summary statistics.基于汇总统计量的惩罚回归多基因评分。
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.