Suppr超能文献

全基因组组学数据集中具有负二项分布的增强自适应排列检验

Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets.

作者信息

Huh Iksoo, Park Taesung

机构信息

College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, 03080, Korea.

Department of Statistics, Seoul National University, Seoul, 08826, Korea.

出版信息

Genes Genomics. 2025 Jan;47(1):59-70. doi: 10.1007/s13258-024-01584-w. Epub 2024 Nov 6.

Abstract

BACKGROUND

The permutation test has been widely used to provide the p-values of statistical tests when the standard test statistics do not follow parametric null distributions. However, the permutation test may require huge numbers of iterations, especially when the detection of very small p-values is required for multiple testing adjustments in the analysis of datasets with a large number of features.

OBJECTIVE

To overcome this computational burden, we suggest a novel enhanced adaptive permutation test that estimates p-values using the negative binomial (NB) distribution. By the method, the number of permutations are differently determined for individual features according to their potential significance.

METHODS

In detail, the permutation procedure stops, when test statistics from the permuted dataset exceed the observed statistics from the original dataset by a predefined number of times. We showed that this procedure reduced the number of permutations especially when there were many insignificant features. For significant features, we enhanced the reduction with Stouffer's method after splitting datasets.

RESULTS

From the simulation study, we found that the enhanced adaptive permutation test dramatically reduced the number of permutations while keeping the precision of the permutation p-value within a small range, when compared to the ordinary permutation test. In real data analysis, we applied the enhanced adaptive permutation test to a genome-wide single nucleotide polymorphism (SNP) dataset of 327,872 features.

CONCLUSION

We found the analysis with the enhanced adaptive permutation took a feasible time for genome-wide omics datasets, and successfully identified features of highly significant p-values with reasonable confidence intervals.

摘要

背景

当标准检验统计量不遵循参数零分布时,置换检验已被广泛用于提供统计检验的p值。然而,置换检验可能需要大量的迭代,特别是在对具有大量特征的数据集进行分析时,为了进行多重检验调整而需要检测非常小的p值时。

目的

为了克服这种计算负担,我们提出了一种新颖的增强自适应置换检验,该检验使用负二项分布(NB)来估计p值。通过该方法,根据各个特征的潜在显著性,为其分别确定置换次数。

方法

详细来说,当置换数据集的检验统计量超过原始数据集的观察统计量达到预定义的次数时,置换过程停止。我们表明,该过程减少了置换次数,特别是当存在许多无显著意义的特征时。对于显著特征,我们在分割数据集后用斯托弗方法增强了减少效果。

结果

从模拟研究中,我们发现与普通置换检验相比,增强自适应置换检验在将置换p值的精度保持在较小范围内的同时,显著减少了置换次数。在实际数据分析中,我们将增强自适应置换检验应用于一个包含327,872个特征的全基因组单核苷酸多态性(SNP)数据集。

结论

我们发现,对于全基因组组学数据集,使用增强自适应置换检验进行分析所需时间是可行的,并且能够以合理的置信区间成功识别出p值高度显著的特征。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验