全基因组组学数据集中具有负二项分布的增强自适应排列检验

Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.

作者信息

Huh Iksoo, Park Taesung

机构信息

College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, 03080, Korea.

Department of Statistics, Seoul National University, Seoul, 08826, Korea.

出版信息

Genes Genomics. 2025 Jan;47(1):59-70. doi: 10.1007/s13258-024-01584-w. Epub 2024 Nov 6.

DOI:10.1007/s13258-024-01584-w

PMID:39503929

Abstract

BACKGROUND

The permutation test has been widely used to provide the p-values of statistical tests when the standard test statistics do not follow parametric null distributions. However, the permutation test may require huge numbers of iterations, especially when the detection of very small p-values is required for multiple testing adjustments in the analysis of datasets with a large number of features.

OBJECTIVE

To overcome this computational burden, we suggest a novel enhanced adaptive permutation test that estimates p-values using the negative binomial (NB) distribution. By the method, the number of permutations are differently determined for individual features according to their potential significance.

METHODS

In detail, the permutation procedure stops, when test statistics from the permuted dataset exceed the observed statistics from the original dataset by a predefined number of times. We showed that this procedure reduced the number of permutations especially when there were many insignificant features. For significant features, we enhanced the reduction with Stouffer's method after splitting datasets.

RESULTS

From the simulation study, we found that the enhanced adaptive permutation test dramatically reduced the number of permutations while keeping the precision of the permutation p-value within a small range, when compared to the ordinary permutation test. In real data analysis, we applied the enhanced adaptive permutation test to a genome-wide single nucleotide polymorphism (SNP) dataset of 327,872 features.

CONCLUSION

We found the analysis with the enhanced adaptive permutation took a feasible time for genome-wide omics datasets, and successfully identified features of highly significant p-values with reasonable confidence intervals.

摘要

背景

当标准检验统计量不遵循参数零分布时，置换检验已被广泛用于提供统计检验的p值。然而，置换检验可能需要大量的迭代，特别是在对具有大量特征的数据集进行分析时，为了进行多重检验调整而需要检测非常小的p值时。

目的

为了克服这种计算负担，我们提出了一种新颖的增强自适应置换检验，该检验使用负二项分布（NB）来估计p值。通过该方法，根据各个特征的潜在显著性，为其分别确定置换次数。

方法

详细来说，当置换数据集的检验统计量超过原始数据集的观察统计量达到预定义的次数时，置换过程停止。我们表明，该过程减少了置换次数，特别是当存在许多无显著意义的特征时。对于显著特征，我们在分割数据集后用斯托弗方法增强了减少效果。

结果

从模拟研究中，我们发现与普通置换检验相比，增强自适应置换检验在将置换p值的精度保持在较小范围内的同时，显著减少了置换次数。在实际数据分析中，我们将增强自适应置换检验应用于一个包含327,872个特征的全基因组单核苷酸多态性（SNP）数据集。

结论

我们发现，对于全基因组组学数据集，使用增强自适应置换检验进行分析所需时间是可行的，并且能够以合理的置信区间成功识别出p值高度显著的特征。

相似文献

Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.全基因组组学数据集中具有负二项分布的增强自适应排列检验

Genes Genomics. 2025 Jan;47(1):59-70. doi: 10.1007/s13258-024-01584-w. Epub 2024 Nov 6.

Enhanced Permutation Tests via Multiple Pruning.通过多次剪枝增强排列检验

Front Genet. 2020 Jun 25;11:509. doi: 10.3389/fgene.2020.00509. eCollection 2020.

PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies.PBOOST：一种基于 GPU 的全基因组关联研究中并行置换检验工具。

Bioinformatics. 2015 May 1;31(9):1460-2. doi: 10.1093/bioinformatics/btu840. Epub 2014 Dec 21.

Uncovering networks from genome-wide association studies via circular genomic permutation.通过环状基因组置换从全基因组关联研究中揭示网络

G3 (Bethesda). 2012 Sep;2(9):1067-75. doi: 10.1534/g3.112.002618. Epub 2012 Sep 1.

Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis.全基因组关联研究通路分析中的系统排列检验：扩张型心肌病和溃疡性结肠炎遗传网络的识别

BMC Genomics. 2014 Jul 22;15:622. doi: 10.1186/1471-2164-15-622.

Moment based gene set tests.基于矩的基因集检验。

BMC Bioinformatics. 2015 Apr 28;16:132. doi: 10.1186/s12859-015-0571-7.

AP-SKAT: highly-efficient genome-wide rare variant association test.AP-SKAT：高效的全基因组罕见变异关联测试。

BMC Genomics. 2016 Sep 21;17(1):745. doi: 10.1186/s12864-016-3094-3.

Faster permutation inference in brain imaging.脑成像中更快的排列推断

Neuroimage. 2016 Nov 1;141:502-516. doi: 10.1016/j.neuroimage.2016.05.068. Epub 2016 Jun 7.

Accurate and fast small -value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method.利用交叉熵方法对高通量基因组数据分析中的置换检验进行准确快速的小值估计。

Stat Appl Genet Mol Biol. 2023 Aug 25;22(1). doi: 10.1515/sagmb-2021-0067. eCollection 2023 Jan 1.

PRESTO: rapid calculation of order statistic distributions and multiple-testing adjusted P-values via permutation for one and two-stage genetic association studies.PRESTO：通过置换快速计算一阶段和两阶段基因关联研究的顺序统计分布和多重检验校正P值。

BMC Bioinformatics. 2008 Jul 13;9:309. doi: 10.1186/1471-2105-9-309.

本文引用的文献

The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource.NHGRI-EBI GWAS 目录：知识库和存储资源。

Nucleic Acids Res. 2023 Jan 6;51(D1):D977-D985. doi: 10.1093/nar/gkac1010.

Association of Polygenic Variants with Type 2 Diabetes Risk and Their Interaction with Lifestyles in Asians.亚洲人群中多基因变异与 2 型糖尿病风险的关联及其与生活方式的相互作用。

Nutrients. 2022 Aug 6;14(15):3222. doi: 10.3390/nu14153222.

The First Genome-Wide Association Study for Type 2 Diabetes in Youth: The Progress in Diabetes Genetics in Youth (ProDiGY) Consortium.第一型糖尿病的全基因组关联研究：青少年糖尿病遗传学研究（ProDiGY）联盟。

Diabetes. 2021 Apr;70(4):996-1005. doi: 10.2337/db20-0443. Epub 2021 Jan 21.

Enhanced Permutation Tests via Multiple Pruning.通过多次剪枝增强排列检验

Front Genet. 2020 Jun 25;11:509. doi: 10.3389/fgene.2020.00509. eCollection 2020.

Hierarchical structural component modeling of microRNA-mRNA integration analysis.miRNA-mRNA 整合分析的层次结构组件建模。

BMC Bioinformatics. 2018 May 8;19(Suppl 4):75. doi: 10.1186/s12859-018-2070-0.

Pathway-based approach using hierarchical components of collapsed rare variants.使用折叠罕见变异的分层组件的基于通路的方法。

Bioinformatics. 2016 Sep 1;32(17):i586-i594. doi: 10.1093/bioinformatics/btw425.

Differential methylation analysis for BS-seq data under general experimental design.BS-Seq 数据在一般实验设计下的差异甲基化分析。

Bioinformatics. 2016 May 15;32(10):1446-53. doi: 10.1093/bioinformatics/btw026. Epub 2016 Jan 27.

Estimating genome-wide significance for whole-genome sequencing studies.估算全基因组测序研究的全基因组显著性。

Genet Epidemiol. 2014 May;38(4):281-90. doi: 10.1002/gepi.21797. Epub 2014 Feb 14.

New susceptibility loci in MYL2, C12orf51 and OAS1 associated with 1-h plasma glucose as predisposing risk factors for type 2 diabetes in the Korean population.在韩国人群中，MYL2、C12orf51 和 OAS1 中的新易感性位点与 1 小时血浆葡萄糖相关，是 2 型糖尿病的易感危险因素。

J Hum Genet. 2013 Jun;58(6):362-5. doi: 10.1038/jhg.2013.14. Epub 2013 Apr 11.

SNP-PRAGE: SNP-based parametric robust analysis of gene set enrichment.SNP-PRAGE：基于单核苷酸多态性的基因集富集参数稳健分析

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S11. doi: 10.1186/1752-0509-5-S2-S11. Epub 2011 Dec 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组组学数据集中具有负二项分布的增强自适应排列检验

Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献