Suppr超能文献

在全基因组研究中检测多重关联。

Detecting multiple associations in genome-wide studies.

作者信息

Dudbridge Frank, Gusnanto Arief, Koeleman Bobby P C

机构信息

MRC Biostatistics Unit, Cambridge, UK.

出版信息

Hum Genomics. 2006 Mar;2(5):310-7. doi: 10.1186/1479-7364-2-5-310.

Abstract

Recent developments in the statistical analysis of genome-wide studies are reviewed. Genome-wide analyses are becoming increasingly common in areas such as scans for disease-associated markers and gene expression profiling. The data generated by these studies present new problems for statistical analysis, owing to the large number of hypothesis tests, comparatively small sample size and modest number of true gene effects. In this review, strategies are described for optimising the genotyping cost by discarding promising genes at an earlier stage, saving resources for the genes that show a trend of association. In addition, there is a review of new methods of analysis that combine evidence across genes to increase sensitivity to multiple true associations in the presence of many non-associated genes. Some methods achieve this by including only the most significant results, whereas others model the overall distribution of results as a mixture of distributions from true and null effects. Because genes are correlated even when having no effect, permutation testing is often necessary to estimate the overall significance, but this can be very time consuming. Efficiency can be improved by fitting a parametric distribution to permutation replicates, which can be re-used in subsequent analyses. Methods are also available to generate random draws from the permutation distribution. The review also includes discussion of new error measures that give a more reasonable interpretation of genome-wide studies, together with improved sensitivity. The false discovery rate allows a controlled proportion of positive results to be false, while detecting more true positives; and the local false discovery rate and false-positive report probability give clarity on whether or not a statistically significant test represents a real discovery.

摘要

本文综述了全基因组研究统计分析的最新进展。全基因组分析在疾病相关标志物扫描和基因表达谱分析等领域正变得越来越普遍。这些研究产生的数据给统计分析带来了新问题,这是由于假设检验数量众多、样本量相对较小以及真正的基因效应数量有限。在本综述中,描述了通过在早期舍弃有前景的基因来优化基因分型成本的策略,从而为显示关联趋势的基因节省资源。此外,还综述了新的分析方法,这些方法整合跨基因的证据,以提高在存在许多非关联基因的情况下对多个真实关联的敏感性。一些方法通过仅纳入最显著的结果来实现这一点,而其他方法则将结果的总体分布建模为真实效应和无效效应分布的混合。由于即使基因没有效应时它们之间也存在相关性,因此通常需要进行置换检验来估计总体显著性,但这可能非常耗时。通过对置换重复拟合参数分布可以提高效率,该分布可在后续分析中重复使用。也有方法可从置换分布中生成随机抽样。本综述还讨论了新的误差度量,这些度量能对全基因组研究给出更合理的解释,同时提高敏感性。错误发现率允许在控制阳性结果中一定比例的错误的同时检测到更多真实阳性;局部错误发现率和假阳性报告概率则明确了具有统计学显著性的检验是否代表真正的发现。

相似文献

1
Detecting multiple associations in genome-wide studies.
Hum Genomics. 2006 Mar;2(5):310-7. doi: 10.1186/1479-7364-2-5-310.
2
Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.
Magn Reson Imaging. 2018 Jun;49:101-115. doi: 10.1016/j.mri.2018.01.004. Epub 2018 Feb 3.
5
Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments.
Bioinformatics. 2006 Jun 15;22(12):1486-94. doi: 10.1093/bioinformatics/btl109. Epub 2006 Mar 30.
6
[Genome-wide association study on complex diseases: genetic statistical issues].
Yi Chuan. 2008 May;30(5):543-9. doi: 10.3724/sp.j.1005.2008.00543.
7
Empirical Bayes screening of many p-values with applications to microarray studies.
Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.
8
An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies.
PLoS Genet. 2015 Dec 29;11(12):e1005717. doi: 10.1371/journal.pgen.1005717. eCollection 2015 Dec.
9
ExactFDR: exact computation of false discovery rate estimate in case-control association studies.
Bioinformatics. 2008 Oct 15;24(20):2407-8. doi: 10.1093/bioinformatics/btn379. Epub 2008 Jul 28.
10
Power and sample size estimation in microarray studies.
BMC Bioinformatics. 2010 Jan 25;11:48. doi: 10.1186/1471-2105-11-48.

引用本文的文献

1
Genomic Landscape of Susceptibility to Severe COVID-19 in the Slovenian Population.
Int J Mol Sci. 2024 Jul 12;25(14):7674. doi: 10.3390/ijms25147674.
2
Quantifying posterior effect size distribution of susceptibility loci by common summary statistics.
Genet Epidemiol. 2020 Jun;44(4):339-351. doi: 10.1002/gepi.22286. Epub 2020 Feb 25.
3
Re-assessment of multiple testing strategies for more efficient genome-wide association studies.
Eur J Hum Genet. 2018 Jul;26(7):1038-1048. doi: 10.1038/s41431-018-0125-3. Epub 2018 Mar 9.
4
Multiple Testing in the Context of Gene Discovery in Sickle Cell Disease Using Genome-Wide Association Studies.
Genomics Insights. 2017 Aug 1;10:1178631017721178. doi: 10.1177/1178631017721178. eCollection 2017.
5
Precision assessment of heterogeneity of lymphedema phenotype, genotypes and risk prediction.
Breast. 2016 Oct;29:231-40. doi: 10.1016/j.breast.2016.06.023. Epub 2016 Jul 22.
6
Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level.
PLoS Comput Biol. 2016 Jun 29;12(6):e1004993. doi: 10.1371/journal.pcbi.1004993. eCollection 2016 Jun.
8
Developing Peripheral Blood Gene Expression-Based Diagnostic Tests for Coronary Artery Disease: a Review.
J Cardiovasc Transl Res. 2015 Aug;8(6):372-80. doi: 10.1007/s12265-015-9641-5. Epub 2015 Jun 25.
9
Assessing the Probability that a Finding Is Genuine for Large-Scale Genetic Association Studies.
PLoS One. 2015 May 8;10(5):e0124107. doi: 10.1371/journal.pone.0124107. eCollection 2015.

本文引用的文献

1
Fold-change estimation of differentially expressed genes using mixture mixed-model.
Stat Appl Genet Mol Biol. 2005;4:Article26. doi: 10.2202/1544-6115.1145. Epub 2005 Sep 21.
2
Evaluation of Nyholt's procedure for multiple testing correction.
Hum Hered. 2005;60(1):19-25; discussion 61-2. doi: 10.1159/000087540. Epub 2005 Aug 23.
3
Why most published research findings are false.
PLoS Med. 2005 Aug;2(8):e124. doi: 10.1371/journal.pmed.0020124. Epub 2005 Aug 30.
4
Toward genome-wide SNP genotyping.
Nat Genet. 2005 Jun;37 Suppl:S5-10. doi: 10.1038/ng1558.
5
Genome-wide association study in esophageal cancer using GeneChip mapping 10K array.
Cancer Res. 2005 Apr 1;65(7):2542-6. doi: 10.1158/0008-5472.CAN-04-3247.
6
Genome-wide strategies for detecting multiple loci that influence complex diseases.
Nat Genet. 2005 Apr;37(4):413-7. doi: 10.1038/ng1537. Epub 2005 Mar 27.
7
Complement factor H polymorphism in age-related macular degeneration.
Science. 2005 Apr 15;308(5720):385-9. doi: 10.1126/science.1109557. Epub 2005 Mar 10.
8
Genome-wide association studies: theoretical and practical concerns.
Nat Rev Genet. 2005 Feb;6(2):109-18. doi: 10.1038/nrg1522.
9
Rapid simulation of P values for product methods and multiple-testing adjustment in association studies.
Am J Hum Genet. 2005 Mar;76(3):399-408. doi: 10.1086/428140. Epub 2005 Jan 11.
10
Measuring and using admixture to study the genetics of complex diseases.
Hum Genomics. 2003 Nov;1(1):52-62. doi: 10.1186/1479-7364-1-1-52.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验