全基因组研究的统计学显著性

Statistical significance for genomewide studies.

作者信息

Storey John D, Tibshirani Robert

机构信息

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

出版信息

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

DOI:10.1073/pnas.1530509100

PMID:12883005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC170937/

Abstract

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features in a genomewide data set are tested against some null hypothesis, where a number of features are expected to be significant. Here we propose an approach to measuring statistical significance in these genomewide studies based on the concept of the false discovery rate. This approach offers a sensible balance between the number of true and false positives that is automatically calibrated and easily interpreted. In doing so, a measure of statistical significance called the q value is associated with each tested feature. The q value is similar to the well known p value, except it is a measure of significance in terms of the false discovery rate rather than the false positive rate. Our approach avoids a flood of false positive results, while offering a more liberal criterion than what has been used in genome scans for linkage.

摘要

随着全基因组实验的增加以及多个基因组的测序，对大数据集的分析在生物学中已变得很常见。通常情况下，全基因组数据集中的数千个特征会针对某个零假设进行检验，预计其中一些特征会具有显著性。在此，我们基于错误发现率的概念提出一种在这些全基因组研究中衡量统计显著性的方法。这种方法在真阳性和假阳性数量之间提供了一种合理的平衡，它会自动校准且易于解释。这样一来，一种名为q值的统计显著性度量就与每个被检验的特征相关联。q值类似于广为人知的p值，不同之处在于它是根据错误发现率而非假阳性率来衡量显著性的。我们的方法避免了大量假阳性结果的出现，同时提供了一种比在连锁基因组扫描中所使用的更为宽松的标准。

相似文献

Statistical significance for genomewide studies.

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

The false discovery rate: a key concept in large-scale genetic studies.

Cancer Control. 2010 Jan;17(1):58-62. doi: 10.1177/107327481001700108.

Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies.

Am J Hum Genet. 2004 Sep;75(3):424-35. doi: 10.1086/423738. Epub 2004 Jul 19.

Rank order metrics for quantifying the association of sequence features with gene regulation.

Bioinformatics. 2003 Jan 22;19(2):212-8. doi: 10.1093/bioinformatics/19.2.212.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

Empirical Bayes screening of many p-values with applications to microarray studies.

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

Determination of the differentially expressed genes in microarray experiments using local FDR.

BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.

Poisson approximation for significance in genome-wide ChIP-chip tiling arrays.

Bioinformatics. 2008 Dec 15;24(24):2825-31. doi: 10.1093/bioinformatics/btn549. Epub 2008 Oct 25.

Relaxed significance criteria for linkage analysis.

Genetics. 2006 Aug;173(4):2371-81. doi: 10.1534/genetics.105.052506. Epub 2006 Jun 18.

Normal uniform mixture differential gene expression detection for cDNA microarrays.

BMC Bioinformatics. 2005 Jul 12;6:173. doi: 10.1186/1471-2105-6-173.

引用本文的文献

DIAMOND2GO: rapid Gene Ontology assignment and enrichment detection for functional genomics.

Front Bioinform. 2025 Aug 15;5:1634042. doi: 10.3389/fbinf.2025.1634042. eCollection 2025.

Exploring the Causal Links Between 338 Cerebrospinal Fluid Metabolites and Parkinson's Disease.

Brain Behav. 2025 Sep;15(9):e70815. doi: 10.1002/brb3.70815.

Haptoglobin phenotypes and structural variants associate with post-exertional malaise and cognitive dysfunction in myalgic encephalomyelitis.

J Transl Med. 2025 Aug 28;23(1):970. doi: 10.1186/s12967-025-07006-z.

Genome-wide association and fine-mapping analyses identify novel candidate genes affecting serum cortisol levels using imputed whole-genome sequencing data in pigs.

J Anim Sci Technol. 2025 Jul;67(4):759-772. doi: 10.5187/jast.2024.e83. Epub 2025 Jul 31.

Gene expression QTL mapping in stimulated iPSC-derived macrophages provides insights into common complex diseases.

Nat Commun. 2025 Aug 27;16(1):7204. doi: 10.1038/s41467-025-61670-9.

Inhibition of Cardiac p38 Highlights the Role of the Phosphoproteome in Heart Failure Progression.

ACS Omega. 2025 Aug 6;10(32):36082-36097. doi: 10.1021/acsomega.5c03687. eCollection 2025 Aug 19.

Response splicing quantitative trait loci in primary human chondrocytes identify putative osteoarthritis risk genes.

Nat Commun. 2025 Aug 26;16(1):7932. doi: 10.1038/s41467-025-63299-0.

A study of gene expression in the living human brain.

Mol Psychiatry. 2025 Aug 23. doi: 10.1038/s41380-025-03163-1.

Germline structural variations involving the pediatric brain tumor transcriptome include disease-relevant and ancestry-related genes.

Acta Neuropathol Commun. 2025 Aug 20;13(1):179. doi: 10.1186/s40478-025-02098-6.

Unique genetic signatures in HIV-1 subtype A1 and A1D recombinant envelope glycoprotein distinguish contemporary transmitted/founder viruses from historical strains in East Africa.

Front Microbiol. 2025 Aug 4;16:1632581. doi: 10.3389/fmicb.2025.1632581. eCollection 2025.

本文引用的文献

Sequential tests for the detection of linkage.

Am J Hum Genet. 1955 Sep;7(3):277-318.

From patterns to pathways: gene expression data analysis comes of age.

Nat Genet. 2002 Dec;32 Suppl:502-8. doi: 10.1038/ng1033.

Transcriptional regulatory networks in Saccharomyces cerevisiae.

Science. 2002 Oct 25;298(5594):799-804. doi: 10.1126/science.1075090.

Predictive identification of exonic splicing enhancers in human genes.

Science. 2002 Aug 9;297(5583):1007-13. doi: 10.1126/science.1073774. Epub 2002 Jul 11.

Empirical bayes methods and false discovery rates for microarrays.

Genet Epidemiol. 2002 Jun;23(1):70-86. doi: 10.1002/gepi.1124.

Genetic dissection of transcriptional regulation in budding yeast.

Science. 2002 Apr 26;296(5568):752-5. doi: 10.1126/science.1069516. Epub 2002 Mar 28.

Significance analysis of microarrays applied to the ionizing radiation response.

Proc Natl Acad Sci U S A. 2001 Apr 24;98(9):5116-21. doi: 10.1073/pnas.091062498. Epub 2001 Apr 17.

Gene-expression profiles in hereditary breast cancer.

N Engl J Med. 2001 Feb 22;344(8):539-48. doi: 10.1056/NEJM200102223440801.

Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases.

Trends Biochem Sci. 2000 Mar;25(3):106-10. doi: 10.1016/s0968-0004(00)01549-8.

Connective tissue growth factor induces apoptosis in human breast cancer cell line MCF-7.

J Biol Chem. 1999 Dec 24;274(52):37461-6. doi: 10.1074/jbc.274.52.37461.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组研究的统计学显著性

Statistical significance for genomewide studies.

作者信息

Storey John D, Tibshirani Robert

机构信息

Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

出版信息

Proc Natl Acad Sci U S A. 2003 Aug 5;100(16):9440-5. doi: 10.1073/pnas.1530509100. Epub 2003 Jul 25.

DOI:10.1073/pnas.1530509100

PMID:12883005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC170937/

Abstract

摘要

全基因组研究的统计学显著性

Statistical significance for genomewide studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

全基因组研究的统计学显著性

Statistical significance for genomewide studies.

作者信息

机构信息

出版信息