在全基因组研究中检测多重关联。

Detecting multiple associations in genome-wide studies.

作者信息

Dudbridge Frank, Gusnanto Arief, Koeleman Bobby P C

机构信息

MRC Biostatistics Unit, Cambridge, UK.

出版信息

Hum Genomics. 2006 Mar;2(5):310-7. doi: 10.1186/1479-7364-2-5-310.

DOI:10.1186/1479-7364-2-5-310

PMID:16595075

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3500180/

Abstract

Recent developments in the statistical analysis of genome-wide studies are reviewed. Genome-wide analyses are becoming increasingly common in areas such as scans for disease-associated markers and gene expression profiling. The data generated by these studies present new problems for statistical analysis, owing to the large number of hypothesis tests, comparatively small sample size and modest number of true gene effects. In this review, strategies are described for optimising the genotyping cost by discarding promising genes at an earlier stage, saving resources for the genes that show a trend of association. In addition, there is a review of new methods of analysis that combine evidence across genes to increase sensitivity to multiple true associations in the presence of many non-associated genes. Some methods achieve this by including only the most significant results, whereas others model the overall distribution of results as a mixture of distributions from true and null effects. Because genes are correlated even when having no effect, permutation testing is often necessary to estimate the overall significance, but this can be very time consuming. Efficiency can be improved by fitting a parametric distribution to permutation replicates, which can be re-used in subsequent analyses. Methods are also available to generate random draws from the permutation distribution. The review also includes discussion of new error measures that give a more reasonable interpretation of genome-wide studies, together with improved sensitivity. The false discovery rate allows a controlled proportion of positive results to be false, while detecting more true positives; and the local false discovery rate and false-positive report probability give clarity on whether or not a statistically significant test represents a real discovery.

摘要

本文综述了全基因组研究统计分析的最新进展。全基因组分析在疾病相关标志物扫描和基因表达谱分析等领域正变得越来越普遍。这些研究产生的数据给统计分析带来了新问题，这是由于假设检验数量众多、样本量相对较小以及真正的基因效应数量有限。在本综述中，描述了通过在早期舍弃有前景的基因来优化基因分型成本的策略，从而为显示关联趋势的基因节省资源。此外，还综述了新的分析方法，这些方法整合跨基因的证据，以提高在存在许多非关联基因的情况下对多个真实关联的敏感性。一些方法通过仅纳入最显著的结果来实现这一点，而其他方法则将结果的总体分布建模为真实效应和无效效应分布的混合。由于即使基因没有效应时它们之间也存在相关性，因此通常需要进行置换检验来估计总体显著性，但这可能非常耗时。通过对置换重复拟合参数分布可以提高效率，该分布可在后续分析中重复使用。也有方法可从置换分布中生成随机抽样。本综述还讨论了新的误差度量，这些度量能对全基因组研究给出更合理的解释，同时提高敏感性。错误发现率允许在控制阳性结果中一定比例的错误的同时检测到更多真实阳性；局部错误发现率和假阳性报告概率则明确了具有统计学显著性的检验是否代表真正的发现。

相似文献

Detecting multiple associations in genome-wide studies.在全基因组研究中检测多重关联。

Hum Genomics. 2006 Mar;2(5):310-7. doi: 10.1186/1479-7364-2-5-310.

Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.功能磁共振成像数据集的簇水平统计推断：高维随机字段的意外行为

Magn Reson Imaging. 2018 Jun;49:101-115. doi: 10.1016/j.mri.2018.01.004. Epub 2018 Feb 3.

Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies.在包含全基因组关联研究在内的相关数据大型研究中，对多个关联的显著性水平进行高效计算。

Am J Hum Genet. 2004 Sep;75(3):424-35. doi: 10.1086/423738. Epub 2004 Jul 19.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Construction of null statistics in permutation-based multiple testing for multi-factorial microarray experiments.基于排列的多因素微阵列实验多重检验中零统计量的构建。

Bioinformatics. 2006 Jun 15;22(12):1486-94. doi: 10.1093/bioinformatics/btl109. Epub 2006 Mar 30.

[Genome-wide association study on complex diseases: genetic statistical issues].[复杂疾病的全基因组关联研究：遗传统计学问题]

Yi Chuan. 2008 May;30(5):543-9. doi: 10.3724/sp.j.1005.2008.00543.

Empirical Bayes screening of many p-values with applications to microarray studies.用于微阵列研究的多p值经验贝叶斯筛选。

Bioinformatics. 2005 May 1;21(9):1987-94. doi: 10.1093/bioinformatics/bti301. Epub 2005 Feb 2.

An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies.全基因组关联研究中效应大小分布的经验贝叶斯混合模型。

PLoS Genet. 2015 Dec 29;11(12):e1005717. doi: 10.1371/journal.pgen.1005717. eCollection 2015 Dec.

ExactFDR: exact computation of false discovery rate estimate in case-control association studies.精确错误发现率：病例对照关联研究中错误发现率估计值的精确计算。

Bioinformatics. 2008 Oct 15;24(20):2407-8. doi: 10.1093/bioinformatics/btn379. Epub 2008 Jul 28.

Power and sample size estimation in microarray studies.微阵列研究中的功效和样本量估计。

BMC Bioinformatics. 2010 Jan 25;11:48. doi: 10.1186/1471-2105-11-48.

引用本文的文献

Genomic Landscape of Susceptibility to Severe COVID-19 in the Slovenian Population.斯洛文尼亚人群中严重 COVID-19 易感性的基因组景观。

Int J Mol Sci. 2024 Jul 12;25(14):7674. doi: 10.3390/ijms25147674.

Quantifying posterior effect size distribution of susceptibility loci by common summary statistics.利用常见汇总统计量量化易感性基因座的后效大小分布。

Genet Epidemiol. 2020 Jun;44(4):339-351. doi: 10.1002/gepi.22286. Epub 2020 Feb 25.

Re-assessment of multiple testing strategies for more efficient genome-wide association studies.重新评估多种测试策略，以提高全基因组关联研究的效率。

Eur J Hum Genet. 2018 Jul;26(7):1038-1048. doi: 10.1038/s41431-018-0125-3. Epub 2018 Mar 9.

Multiple Testing in the Context of Gene Discovery in Sickle Cell Disease Using Genome-Wide Association Studies.镰状细胞病基因发现背景下全基因组关联研究中的多重检验

Genomics Insights. 2017 Aug 1;10:1178631017721178. doi: 10.1177/1178631017721178. eCollection 2017.

Precision assessment of heterogeneity of lymphedema phenotype, genotypes and risk prediction.淋巴水肿表型、基因型及风险预测异质性的精准评估。

Breast. 2016 Oct;29:231-40. doi: 10.1016/j.breast.2016.06.023. Epub 2016 Jul 22.

Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level.大规模测序研究中单个基因座水平的罕见变异关联分析

PLoS Comput Biol. 2016 Jun 29;12(6):e1004993. doi: 10.1371/journal.pcbi.1004993. eCollection 2016 Jun.

Monoacylglycerol lipase (MGLL) polymorphism rs604300 interacts with childhood adversity to predict cannabis dependence symptoms and amygdala habituation: Evidence from an endocannabinoid system-level analysis.单酰甘油脂肪酶（MGLL）基因多态性rs604300与童年逆境相互作用，以预测大麻依赖症状和杏仁核习惯化：来自内源性大麻素系统水平分析的证据。

J Abnorm Psychol. 2015 Nov;124(4):860-77. doi: 10.1037/abn0000079.

Developing Peripheral Blood Gene Expression-Based Diagnostic Tests for Coronary Artery Disease: a Review.开发基于外周血基因表达的冠状动脉疾病诊断测试：综述

J Cardiovasc Transl Res. 2015 Aug;8(6):372-80. doi: 10.1007/s12265-015-9641-5. Epub 2015 Jun 25.

Assessing the Probability that a Finding Is Genuine for Large-Scale Genetic Association Studies.评估大规模基因关联研究中一个发现为真实结果的概率。

PLoS One. 2015 May 8;10(5):e0124107. doi: 10.1371/journal.pone.0124107. eCollection 2015.

Genetic variations in the VEGF pathway as prognostic factors in metastatic colorectal cancer patients treated with oxaliplatin-based chemotherapy.VEGF通路中的基因变异作为接受奥沙利铂为基础化疗的转移性结直肠癌患者的预后因素

Pharmacogenomics J. 2015 Oct;15(5):397-404. doi: 10.1038/tpj.2015.1. Epub 2015 Feb 24.

本文引用的文献

Fold-change estimation of differentially expressed genes using mixture mixed-model.使用混合混合模型估计差异表达基因的倍数变化

Stat Appl Genet Mol Biol. 2005;4:Article26. doi: 10.2202/1544-6115.1145. Epub 2005 Sep 21.

Evaluation of Nyholt's procedure for multiple testing correction.奈霍尔特多重检验校正程序的评估。

Hum Hered. 2005;60(1):19-25; discussion 61-2. doi: 10.1159/000087540. Epub 2005 Aug 23.

Why most published research findings are false.为何大多数已发表的研究结果是错误的。

PLoS Med. 2005 Aug;2(8):e124. doi: 10.1371/journal.pmed.0020124. Epub 2005 Aug 30.

Toward genome-wide SNP genotyping.迈向全基因组单核苷酸多态性基因分型

Nat Genet. 2005 Jun;37 Suppl:S5-10. doi: 10.1038/ng1558.

Genome-wide association study in esophageal cancer using GeneChip mapping 10K array.使用基因芯片映射10K阵列进行食管癌全基因组关联研究。

Cancer Res. 2005 Apr 1;65(7):2542-6. doi: 10.1158/0008-5472.CAN-04-3247.

Genome-wide strategies for detecting multiple loci that influence complex diseases.用于检测影响复杂疾病的多个基因座的全基因组策略。

Nat Genet. 2005 Apr;37(4):413-7. doi: 10.1038/ng1537. Epub 2005 Mar 27.

Complement factor H polymorphism in age-related macular degeneration.年龄相关性黄斑变性中的补体因子H多态性

Science. 2005 Apr 15;308(5720):385-9. doi: 10.1126/science.1109557. Epub 2005 Mar 10.

Genome-wide association studies: theoretical and practical concerns.全基因组关联研究：理论与实际问题

Nat Rev Genet. 2005 Feb;6(2):109-18. doi: 10.1038/nrg1522.

Rapid simulation of P values for product methods and multiple-testing adjustment in association studies.关联研究中乘积法的P值快速模拟及多重检验校正

Am J Hum Genet. 2005 Mar;76(3):399-408. doi: 10.1086/428140. Epub 2005 Jan 11.

Measuring and using admixture to study the genetics of complex diseases.测量并利用基因混合来研究复杂疾病的遗传学。

Hum Genomics. 2003 Nov;1(1):52-62. doi: 10.1186/1479-7364-1-1-52.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验