Suppr超能文献

一种简单灵活的样本可交换性检验及其在统计基因组学中的应用

A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.

作者信息

Aw Alan J, Spence Jeffrey P, Song Yun S

机构信息

Department of Statistics, University of California, Berkeley.

Department of Genetics, School of Medicine, Stanford University.

出版信息

Ann Appl Stat. 2024 Mar;18(1):858-881. doi: 10.1214/23-aoas1817. Epub 2024 Jan 31.

Abstract

In scientific studies involving analyses of multivariate data, basic but important questions often arise for the researcher: Is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Are the features independent of one another, or perhaps the features can be grouped so that the groups are mutually independent? In statistical genomics, these considerations are fundamental to downstream tasks such as demographic inference and the construction of polygenic risk scores. We propose a non-parametric approach, which we call the V test, to address these two questions, namely, a test of sample exchangeability given dependency structure of features, and a test of feature independence given sample exchangeability. Our test is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. Through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our test compares favorably in various scenarios of interest. We apply the test to data from the 1000 Genomes Project, demonstrating how it can be employed to assess exchangeability of the genetic sample, or find optimal linkage disequilibrium (LD) splits for downstream analysis. For exchangeability assessment, we find that removing rare variants can substantially increase the -value of the test statistic. For optimal LD splitting, the V test reports different optimal splits than previous approaches not relying on hypothesis testing. Software for our methods is available in R (CRAN: flintyR) and Python (PyPI: flintyPy).

摘要

在涉及多变量数据分析的科学研究中,研究人员常常会遇到一些基本但重要的问题:样本是否可交换,即样本的联合分布对于单元的排序是否不变?特征之间是否相互独立,或者是否可以将特征分组,使得这些组相互独立?在统计基因组学中,这些考量对于诸如人口推断和多基因风险评分构建等下游任务至关重要。我们提出一种非参数方法,我们称之为V检验,以解决这两个问题,即给定特征依赖结构时的样本可交换性检验,以及给定样本可交换性时的特征独立性检验。我们的检验在概念上很简单,但快速且灵活。它在实际场景中控制第一类错误,并通过利用大样本渐近性来处理任意维度的数据。通过广泛的模拟以及与基于随机矩阵理论的无监督分层检验进行比较,我们发现在各种感兴趣的场景中,我们的检验表现良好。我们将该检验应用于千人基因组计划的数据,展示了它如何用于评估遗传样本的可交换性,或为下游分析找到最优的连锁不平衡(LD)划分。对于可交换性评估,我们发现去除罕见变异可以显著提高检验统计量的p值。对于最优LD划分,V检验报告的最优划分与以往不依赖假设检验的方法不同。我们方法的软件可在R(CRAN:flintyR)和Python(PyPI:flintyPy)中获取。

相似文献

1
A SIMPLE AND FLEXIBLE TEST OF SAMPLE EXCHANGEABILITY WITH APPLICATIONS TO STATISTICAL GENOMICS.
Ann Appl Stat. 2024 Mar;18(1):858-881. doi: 10.1214/23-aoas1817. Epub 2024 Jan 31.
3
Testing exchangeability of multivariate distributions.
J Appl Stat. 2022 Jul 26;50(15):3142-3156. doi: 10.1080/02664763.2022.2102158. eCollection 2023.
4
mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes.
BMC Bioinformatics. 2021 Jan 6;22(1):12. doi: 10.1186/s12859-020-03945-0.
5
Nonparametric methods for microarray data based on exchangeability and borrowed power.
J Biopharm Stat. 2005;15(5):783-97. doi: 10.1081/BIP-200067778.
6
Multi-scale Fisher's independence test for multivariate dependence.
Biometrika. 2022 Sep;109(3):569-587. doi: 10.1093/biomet/asac013. Epub 2022 Feb 21.
7
8
Robust exchangeability designs for early phase clinical trials with multiple strata.
Pharm Stat. 2016 Mar-Apr;15(2):123-34. doi: 10.1002/pst.1730. Epub 2015 Dec 18.
10
POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data.
Genet Epidemiol. 2018 Jun;42(4):366-377. doi: 10.1002/gepi.22117. Epub 2018 Mar 12.

引用本文的文献

本文引用的文献

1
Testing exchangeability of multivariate distributions.
J Appl Stat. 2022 Jul 26;50(15):3142-3156. doi: 10.1080/02664763.2022.2102158. eCollection 2023.
2
Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores.
HGG Adv. 2022 Aug 18;3(4):100136. doi: 10.1016/j.xhgg.2022.100136. eCollection 2022 Oct 13.
3
Optimal linkage disequilibrium splitting.
Bioinformatics. 2021 Dec 22;38(1):255-256. doi: 10.1093/bioinformatics/btab519.
4
Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations.
Sci Adv. 2019 Oct 23;5(10):eaaw9206. doi: 10.1126/sciadv.aaw9206. eCollection 2019 Oct.
5
Exact testing with random permutations.
Test (Madr). 2018;27(4):811-825. doi: 10.1007/s11749-017-0571-1. Epub 2017 Nov 30.
6
Probabilistic fine-mapping of transcriptome-wide association studies.
Nat Genet. 2019 Apr;51(4):675-682. doi: 10.1038/s41588-019-0367-1. Epub 2019 Mar 29.
7
Distribution-free tests of independence in high dimensions.
Biometrika. 2017 Dec;104(4):813-828. doi: 10.1093/biomet/asx050. Epub 2017 Oct 3.
9
Eigenvalue significance testing for genetic association.
Biometrics. 2018 Jun;74(2):439-447. doi: 10.1111/biom.12767. Epub 2017 Aug 29.
10
Polygenic scores via penalized regression on summary statistics.
Genet Epidemiol. 2017 Sep;41(6):469-480. doi: 10.1002/gepi.22050. Epub 2017 May 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验