Suppr超能文献

多阶段全基因组关联研究中的方法学问题

Methodological Issues in Multistage Genome-wide Association Studies.

作者信息

Thomas Duncan C, Casey Graham, Conti David V, Haile Robert W, Lewinger Juan Pablo, Stram Daniel O

机构信息

Department of Preventive Medicine, University of Southern California.

出版信息

Stat Sci. 2009 Nov 1;24(4):414-429. doi: 10.1214/09-sts288.

Abstract

Because of the high cost of commercial genotyping chip technologies, many investigations have used a two-stage design for genome-wide association studies, using part of the sample for an initial discovery of "promising" SNPs at a less stringent significance level and the remainder in a joint analysis of just these SNPs using custom genotyping. Typical cost savings of about 50% are possible with this design to obtain comparable levels of overall type I error and power by using about half the sample for stage I and carrying about 0.1% of SNPs forward to the second stage, the optimal design depending primarily upon the ratio of costs per genotype for stages I and II. However, with the rapidly declining costs of the commercial panels, the generally low observed ORs of current studies, and many studies aiming to test multiple hypotheses and multiple endpoints, many investigators are abandoning the two-stage design in favor of simply genotyping all available subjects using a standard high-density panel. Concern is sometimes raised about the absence of a "replication" panel in this approach, as required by some high-profile journals, but it must be appreciated that the two-stage design is not a discovery/replication design but simply a more efficient design for discovery using a joint analysis of the data from both stages. Once a subset of highly-significant associations has been discovered, a truly independent "exact replication" study is needed in a similar population of the same promising SNPs using similar methods. This can then be followed by (1) "generalizability" studies to assess the full scope of replicated associations across different races, different endpoints, different interactions, etc.; (2) fine-mapping or re-sequencing to try to identify the causal variant; and (3) experimental studies of the biological function of these genes. Multistage sampling designs may be more useful at this stage, say for selecting subsets of subjects for deep re-sequencing of regions identified in the GWAS.

摘要

由于商业基因分型芯片技术成本高昂,许多研究在全基因组关联研究中采用了两阶段设计,即先用部分样本在较低的显著性水平下初步发现“有前景的”单核苷酸多态性(SNP),然后使用定制基因分型对仅这些SNP进行联合分析,分析其余样本。采用这种设计,通过在第一阶段使用约一半的样本,并将约0.1%的SNP推进到第二阶段,可实现约50%的典型成本节约,以获得可比的总体I型错误水平和检验效能,最优设计主要取决于第一阶段和第二阶段每个基因型的成本比。然而,随着商业检测板成本的迅速下降、当前研究中普遍观察到的较低比值比(OR)以及许多研究旨在检验多个假设和多个终点,许多研究者正放弃两阶段设计,转而倾向于使用标准高密度检测板对所有可用受试者进行基因分型。有时会有人担心这种方法中没有如一些知名期刊所要求的“重复”检测板,但必须认识到两阶段设计并非发现/重复设计,而仅仅是一种通过对两个阶段的数据进行联合分析来更高效地进行发现的设计。一旦发现了一组高度显著的关联,就需要在具有相似前景的相同SNP的相似人群中使用相似方法进行真正独立的“精确重复”研究。随后可以进行:(1)“可推广性”研究,以评估在不同种族、不同终点、不同相互作用等情况下重复关联的完整范围;(2)精细定位或重测序,以试图识别因果变异;以及(3)这些基因生物学功能的实验研究。在这个阶段,多阶段抽样设计可能更有用,比如用于选择受试者子集,对全基因组关联研究中确定的区域进行深度重测序。

相似文献

1
Methodological Issues in Multistage Genome-wide Association Studies.
Stat Sci. 2009 Nov 1;24(4):414-429. doi: 10.1214/09-sts288.
2
Design considerations for genetic linkage and association studies.
Methods Mol Biol. 2012;850:237-62. doi: 10.1007/978-1-61779-555-8_13.
3
Optimal two-stage genotyping designs for genome-wide association scans.
Genet Epidemiol. 2006 May;30(4):356-68. doi: 10.1002/gepi.20150.
4
Optimal multistage designs--a general framework for efficient genome-wide association studies.
Biostatistics. 2009 Apr;10(2):297-309. doi: 10.1093/biostatistics/kxn036. Epub 2008 Dec 15.
5
Prioritize and select SNPs for association studies with multi-stage designs.
J Comput Biol. 2008 Apr;15(3):241-57. doi: 10.1089/cmb.2007.0090.
6
Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
PLoS Genet. 2013;9(8):e1003609. doi: 10.1371/journal.pgen.1003609. Epub 2013 Aug 8.
7
Optimal robust two-stage designs for genome-wide association studies.
Ann Hum Genet. 2009 Nov;73(Pt 6):638-51. doi: 10.1111/j.1469-1809.2009.00544.x.
9
Two-phase sample selection strategies for design and analysis in post-genome-wide association fine-mapping studies.
Stat Med. 2021 Dec 30;40(30):6792-6817. doi: 10.1002/sim.9211. Epub 2021 Oct 1.

引用本文的文献

2
Mitochondrial-derived microproteins: from discovery to function.
Trends Genet. 2025 Feb;41(2):132-145. doi: 10.1016/j.tig.2024.11.010. Epub 2024 Dec 16.
3
Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium.
Stat Sci. 2009 Nov;24(4):503-516. doi: 10.1214/09-sts314. Epub 2010 Apr 20.
4
DNAJC13 influences responses of the extended reward system to conditioned stimuli: a genome-wide association study.
Eur Arch Psychiatry Clin Neurosci. 2025 Mar;275(2):499-510. doi: 10.1007/s00406-024-01905-w. Epub 2024 Oct 17.
6
Two-phase designs for joint quantitative-trait-dependent and genotype-dependent sampling in post-GWAS regional sequencing.
Genet Epidemiol. 2018 Feb;42(1):104-116. doi: 10.1002/gepi.22099. Epub 2017 Dec 14.
7
Genome-wide association studies of albuminuria: towards genetic stratification in diabetes?
J Nephrol. 2018 Aug;31(4):475-487. doi: 10.1007/s40620-017-0437-3. Epub 2017 Sep 16.
8
Genome-wide meta-analysis identifies a novel susceptibility signal at CACNA2D3 for nicotine dependence.
Am J Med Genet B Neuropsychiatr Genet. 2017 Jul;174(5):557-567. doi: 10.1002/ajmg.b.32540. Epub 2017 Apr 25.
9
A Neighborhood-Wide Association Study (NWAS): Example of prostate cancer aggressiveness.
PLoS One. 2017 Mar 27;12(3):e0174548. doi: 10.1371/journal.pone.0174548. eCollection 2017.
10
Large-scale pharmacogenomic study of sulfonylureas and the QT, JT and QRS intervals: CHARGE Pharmacogenomics Working Group.
Pharmacogenomics J. 2018 Jan;18(1):127-135. doi: 10.1038/tpj.2016.90. Epub 2016 Dec 13.

本文引用的文献

1
Rapid and accurate multiple testing correction and power estimation for millions of correlated markers.
PLoS Genet. 2009 Apr;5(4):e1000456. doi: 10.1371/journal.pgen.1000456. Epub 2009 Apr 17.
2
Detecting gene-environment interactions using a combined case-only and case-control approach.
Am J Epidemiol. 2009 Feb 15;169(4):497-504. doi: 10.1093/aje/kwn339. Epub 2008 Dec 13.
3
Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum.
PLoS Genet. 2008 Nov;4(11):e1000282. doi: 10.1371/journal.pgen.1000282. Epub 2008 Nov 28.
4
Gene-environment interaction in genome-wide association studies.
Am J Epidemiol. 2009 Jan 15;169(2):219-26. doi: 10.1093/aje/kwn353. Epub 2008 Nov 20.
5
Genetic mapping in human disease.
Science. 2008 Nov 7;322(5903):881-8. doi: 10.1126/science.1156409.
6
Identification of genetic variants using bar-coded multiplexed sequencing.
Nat Methods. 2008 Oct;5(10):887-93. doi: 10.1038/nmeth.1251. Epub 2008 Sep 14.
7
Curses--winner's and otherwise--in genetic epidemiology.
Epidemiology. 2008 Sep;19(5):649-51; discussion 657-8. doi: 10.1097/EDE.0b013e318181b865.
8
Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.
Am J Hum Genet. 2008 Sep;83(3):311-21. doi: 10.1016/j.ajhg.2008.06.024. Epub 2008 Aug 7.
9
Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.
PLoS Genet. 2008 Jul 25;4(7):e1000130. doi: 10.1371/journal.pgen.1000130.
10
Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms.
Am J Hum Genet. 2008 Jul;83(1):112-9. doi: 10.1016/j.ajhg.2008.06.008. Epub 2008 Jun 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验