排列组合更少，P值更准确。

Fewer permutations, more accurate P-values.

作者信息

Knijnenburg Theo A, Wessels Lodewyk F A, Reinders Marcel J T, Shmulevich Ilya

机构信息

Institute for Systems Biology, Seattle, WA, USA.

出版信息

Bioinformatics. 2009 Jun 15;25(12):i161-8. doi: 10.1093/bioinformatics/btp211.

DOI:10.1093/bioinformatics/btp211

PMID:19477983

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2687965/

Abstract

MOTIVATION

Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible.

RESULTS

A method of computing P-values based on tail approximation is presented. The tail of the distribution of permutation values is approximated by a generalized Pareto distribution. A good fit and thus accurate P-value estimates can be obtained with a drastically reduced number of permutations when compared with the standard empirical way of computing P-values.

AVAILABILITY

The Matlab code can be obtained from the corresponding author on request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

排列检验已成为评估所研究事件统计显著性的标准工具。以P值表示的统计显著性是通过排列值中至少与原始统计量一样极端的排列值所占比例来计算的，原始统计量是从未排列的数据中得出的。这种经验方法直接将最小可获得的P值和P值的分辨率与排列次数联系起来。因此，当要准确估计小P值时，就需要进行大量的排列。这在计算上成本很高，而且通常不可行。

结果

提出了一种基于尾部近似计算P值的方法。排列值分布的尾部由广义帕累托分布近似。与计算P值的标准经验方法相比，使用大幅减少的排列次数就能获得良好的拟合，从而得到准确的P值估计。

可用性

可应要求从相应作者处获取Matlab代码。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82b1/2687965/fab21d195feb/btp211f1.jpg

相似文献

Fewer permutations, more accurate P-values.

Bioinformatics. 2009 Jun 15;25(12):i161-8. doi: 10.1093/bioinformatics/btp211.

EPEPT: a web service for enhanced P-value estimation in permutation tests.

BMC Bioinformatics. 2011 Oct 24;12:411. doi: 10.1186/1471-2105-12-411.

Faster permutation inference in brain imaging.

Neuroimage. 2016 Nov 1;141:502-516. doi: 10.1016/j.neuroimage.2016.05.068. Epub 2016 Jun 7.

Moment based gene set tests.

BMC Bioinformatics. 2015 Apr 28;16:132. doi: 10.1186/s12859-015-0571-7.

Gene set analysis: limitations in popular existing methods and proposed improvements.

Bioinformatics. 2014 Oct;30(19):2747-56. doi: 10.1093/bioinformatics/btu374. Epub 2014 Jun 5.

Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn.

Stat Appl Genet Mol Biol. 2010;9:Article39. doi: 10.2202/1544-6115.1585. Epub 2010 Oct 31.

PBOOST: a GPU-based tool for parallel permutation tests in genome-wide association studies.

Bioinformatics. 2015 May 1;31(9):1460-2. doi: 10.1093/bioinformatics/btu840. Epub 2014 Dec 21.

Conservative adjustment of permutation p-values when the number of permutations is limited.

Int J Bioinform Res Appl. 2007;3(4):536-46. doi: 10.1504/IJBRA.2007.015420.

Fast approximation of small p-values in permutation tests by partitioning the permutations.

Biometrics. 2018 Mar;74(1):196-206. doi: 10.1111/biom.12731. Epub 2017 May 18.

Practical approach to determine sample size for building logistic prediction models using high-throughput data.

J Biomed Inform. 2015 Feb;53:355-62. doi: 10.1016/j.jbi.2014.12.010. Epub 2014 Dec 30.

引用本文的文献

"Touching" the brain: braille reading mitigates the SC-FC decoupling of brain networks in congenital blindness.

Brain Struct Funct. 2025 Jul 7;230(6):114. doi: 10.1007/s00429-025-02975-9.

HLA variation associated with peanut allergy and anaphylaxis among non-Hispanic Black individuals.

J Allergy Clin Immunol Glob. 2025 Apr 22;4(3):100485. doi: 10.1016/j.jacig.2025.100485. eCollection 2025 Aug.

Defining hypoxia in cancer: A landmark evaluation of hypoxia gene expression signatures.

Cell Genom. 2025 Feb 12;5(2):100764. doi: 10.1016/j.xgen.2025.100764. Epub 2025 Jan 31.

An efficient, not-only-linear correlation coefficient based on clustering.

Cell Syst. 2024 Sep 18;15(9):854-868.e3. doi: 10.1016/j.cels.2024.08.005. Epub 2024 Sep 6.

The RNA-binding Selectivity of the RGG/RG Motifs of hnRNP U is Abolished by Elements Within the C-terminal Intrinsically Disordered Region.

J Mol Biol. 2024 Sep 15;436(18):168702. doi: 10.1016/j.jmb.2024.168702. Epub 2024 Jul 10.

Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability.

HGG Adv. 2024 Apr 11;5(2):100283. doi: 10.1016/j.xhgg.2024.100283. Epub 2024 Mar 16.

Information Difference of Transfer Entropies between Head Motion and Eye Movement Indicates a Proxy of Driving.

Entropy (Basel). 2023 Dec 19;26(1):0. doi: 10.3390/e26010003.

Response to photic stimulation as a measure of cortical excitability in epilepsy patients.

Front Neurosci. 2024 Jan 5;17:1308013. doi: 10.3389/fnins.2023.1308013. eCollection 2023.

PLoS Biol. 2023 Jul 25;21(7):e3001930. doi: 10.1371/journal.pbio.3001930. eCollection 2023 Jul.

Computational Construction of Toxicant Signaling Networks.

Chem Res Toxicol. 2023 Aug 21;36(8):1267-1277. doi: 10.1021/acs.chemrestox.2c00422. Epub 2023 Jul 20.

本文引用的文献

Combinatorial effects of environmental parameters on transcriptional regulation in Saccharomyces cerevisiae: a quantitative analysis of a compendium of chemostat-based transcriptome data.

BMC Genomics. 2009 Jan 27;10:53. doi: 10.1186/1471-2164-10-53.

Exact calculation of distributions on integers, with application to sequence alignment.

J Comput Biol. 2009 Jan;16(1):1-18. doi: 10.1089/cmb.2008.0137.

Modeling ChIP sequencing in silico with applications.

PLoS Comput Biol. 2008 Aug 22;4(8):e1000158. doi: 10.1371/journal.pcbi.1000158.

Computation of significance scores of unweighted Gene Set Enrichment Analyses.

BMC Bioinformatics. 2007 Aug 6;8:290. doi: 10.1186/1471-2105-8-290.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.

Iterative Group Analysis (iGA): a simple tool to enhance sensitivity and facilitate interpretation of microarray experiments.

BMC Bioinformatics. 2004 Mar 29;5:34. doi: 10.1186/1471-2105-5-34.

PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.

Nat Genet. 2003 Jul;34(3):267-73. doi: 10.1038/ng1180.

A gene-expression signature as a predictor of survival in breast cancer.

N Engl J Med. 2002 Dec 19;347(25):1999-2009. doi: 10.1056/NEJMoa021967.

Significance analysis of microarrays applied to the ionizing radiation response.

Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21. doi: 10.1073/pnas.091062498. Epub 2001 Apr 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

排列组合更少，P值更准确。

Fewer permutations, more accurate P-values.

作者信息

Knijnenburg Theo A, Wessels Lodewyk F A, Reinders Marcel J T, Shmulevich Ilya

机构信息

Institute for Systems Biology, Seattle, WA, USA.