• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

直接关联研究与间接关联研究相对效力的详细分析及其解读的意义。

Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation.

作者信息

Moskvina V, O'Donovan M C

机构信息

Department of Psychological Medicine, Wales College of Medicine, Cardiff University, Cardiff, UK.

出版信息

Hum Hered. 2007;64(1):63-73. doi: 10.1159/000101424. Epub 2007 Apr 27.

DOI:10.1159/000101424
PMID:17483598
Abstract

OBJECTIVES

Genetic association studies are usually based upon restricted sets of 'tag' markers selected to represent the total sequence variation. Tag selection is often determined by some threshold for the r(2) coefficients of linkage disequilibrium (LD) between tag and untyped markers, it being widely assumed that power to detect an effect at the untyped sites is retained by typing the tag marker in a sample scaled by the inverse of the selected threshold (1/r(2)). However, unless only a single causal variant occurs at a locus, it has been shown [Eur J Hum Genet 2006;14:426-437] that significant power loss can occur if this principle is applied. We sought to investigate whether unexpected loss of power might be an exceptional case or more general concern. In the absence of detailed knowledge about the genetic architecture at complex disease loci, we developed a mathematical approach to test all possible situations.

METHODS

We derived mathematical formulae allowing the calculation of all possible odds ratios (OR) at a tag marker locus given the effect size that would be observed by typing a second locus and the r(2) between the two loci. For a range of allele frequencies, r(2) between loci, and strengths of association at the causal locus (OR from 0.5 to 2) that we consider realistic for complex disease loci, we next determined the sample sizes that would be necessary to give equivalent power to detect association by genotyping tag and causal loci and compared these with the sample sizes predicted by applying 1/r(2).

RESULTS

Under most of the hypothetical scenarios we examined, the calculated sample sizes required to maintain power by typing markers that tag the causal locus at even moderately high r(2) (0.8) were greater than that calculated by applying 1/r(2). Even in populations with apparently similar measurements of allele frequency, LD structure, and effect size at the susceptibility allele, the required sample size to detect association with a tag marker can vary substantially. We also show that in apparently similar populations, associations to either allele at the tag site are possible.

CONCLUSIONS

Indirect tests of association are less powered than sizes predicted by applying 1/r(2) in the majority of hypothetical scenarios we examined. Our findings pertain even for what we consider likely to be larger than average effect sizes in complex diseases (OR = 1.5-2) and even for moderately high r(2) values between the markers. Until a substantial number of disease genes have been identified through methods that are not based on tagging, and therefore biased towards those situations most favourable to tagging, it is impossible to know how the true scenarios are distributed across the range of possible scenarios. Nevertheless, while association designs based upon tag marker selection by necessity are the tool of choice for de novo gene discovery, our data suggest power to initially detect association may often be less than assumed. Moreover, our data suggest that to avoid genuine findings being subsequently discarded by unpredictable losses of power, follow up studies in other samples should be based upon more detailed analyses of the gene rather than simply on the tag SNPs showing association in the discovery study.

摘要

目的

基因关联研究通常基于一组经过筛选的“标签”标记,这些标记被选来代表整个序列变异。标签的选择通常由标签与未分型标记之间的连锁不平衡(LD)的r²系数的某个阈值决定,人们普遍认为,通过在一个样本中对标签标记进行分型,其样本量按所选阈值的倒数(1/r²)进行缩放,就能够保留检测未分型位点效应的能力。然而,除非一个基因座上只出现一个因果变异,否则已有研究表明[《欧洲人类遗传学杂志》2006年;14:426 - 437],如果应用这一原则,可能会出现显著的效能损失。我们试图研究这种意外的效能损失是个别情况还是更普遍的问题。在缺乏关于复杂疾病基因座遗传结构的详细知识的情况下,我们开发了一种数学方法来测试所有可能的情况。

方法

我们推导了数学公式,给定在第二个基因座分型时观察到的效应大小以及两个基因座之间的r²,就可以计算标签标记基因座上所有可能的比值比(OR)。对于一系列我们认为在复杂疾病基因座中实际存在的等位基因频率、基因座之间的r²以及因果基因座的关联强度(OR从0.5到2),我们接下来确定通过对标签基因座和因果基因座进行基因分型来检测关联所需的等效效能的样本量,并将这些样本量与应用1/r²预测的样本量进行比较。

结果

在我们研究的大多数假设情景下,通过对即使是中等偏高r²(0.8)的因果基因座进行标签标记分型来维持效能所需的计算样本量,大于应用1/r²计算的样本量。即使在等位基因频率、LD结构和易感等位基因效应大小的测量结果明显相似的人群中,检测与标签标记关联所需的样本量也可能有很大差异。我们还表明,在明显相似的人群中,标签位点上的任何一个等位基因都可能存在关联。

结论

在我们研究的大多数假设情景下,关联的间接检测效能低于应用1/r²预测的大小。我们的研究结果甚至适用于我们认为在复杂疾病中可能大于平均效应大小的情况(OR = 1.5 - 2),甚至适用于标记之间中等偏高的r²值。在通过非基于标签的方法鉴定出大量疾病基因之前,因此偏向于那些最有利于标签法的情况,我们无法知道真实情景在所有可能情景范围内是如何分布的。然而,虽然基于标签标记选择的关联设计必然是从头发现基因的首选工具,但我们的数据表明,最初检测关联的效能可能常常低于预期。此外,我们的数据表明,为了避免真正的发现随后因不可预测的效能损失而被丢弃,在其他样本中的后续研究应该基于对基因更详细的分析,而不是仅仅基于在发现研究中显示关联的标签单核苷酸多态性。

相似文献

1
Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation.直接关联研究与间接关联研究相对效力的详细分析及其解读的意义。
Hum Hered. 2007;64(1):63-73. doi: 10.1159/000101424. Epub 2007 Apr 27.
2
The power of genome-wide association studies of complex disease genes: statistical limitations of indirect approaches using SNP markers.复杂疾病基因的全基因组关联研究的力量:使用单核苷酸多态性(SNP)标记的间接方法的统计局限性。
J Hum Genet. 2001;46(8):478-82. doi: 10.1007/s100380170048.
3
Quantifying the amount of missing information in genetic association studies.量化基因关联研究中缺失信息的数量。
Genet Epidemiol. 2006 Dec;30(8):703-17. doi: 10.1002/gepi.20181.
4
Further investigation of linkage disequilibrium SNPs and their ability to identify associated susceptibility loci.对连锁不平衡单核苷酸多态性及其识别相关易感基因座能力的进一步研究。
Ann Hum Genet. 2004 May;68(Pt 3):240-8. doi: 10.1046/j.1529-8817.2004.00086.x.
5
Sample size requirements for indirect association studies of gene-environment interactions (G x E).基因-环境相互作用(G×E)间接关联研究的样本量要求。
Genet Epidemiol. 2008 Apr;32(3):235-45. doi: 10.1002/gepi.20298.
6
On selecting markers for association studies: patterns of linkage disequilibrium between two and three diallelic loci.关于关联研究中标记的选择:两个和三个双等位基因位点间的连锁不平衡模式
Genet Epidemiol. 2003 Jan;24(1):57-67. doi: 10.1002/gepi.10217.
7
Sample size calculations for population- and family-based case-control association studies on marker genotypes.基于人群和家系的标记基因型病例对照关联研究的样本量计算。
Genet Epidemiol. 2003 Sep;25(2):136-48. doi: 10.1002/gepi.10245.
8
Efficiency and power in genetic association studies.基因关联研究中的效率与效能
Nat Genet. 2005 Nov;37(11):1217-23. doi: 10.1038/ng1669. Epub 2005 Oct 23.
9
Susceptibility of biallelic haplotype and genotype frequencies to genotyping error.双等位基因单倍型和基因型频率对基因分型错误的敏感性。
Biometrics. 2006 Dec;62(4):1116-23. doi: 10.1111/j.1541-0420.2006.00563.x.
10
Issues concerning association studies for fine mapping a susceptibility gene for a complex disease.关于对复杂疾病的易感基因进行精细定位的关联研究的相关问题。
Genet Epidemiol. 2001 May;20(4):432-57. doi: 10.1002/gepi.1012.

引用本文的文献

1
[Association study between haplotypes of WNT signaling pathway genes and nonsyndromic oral clefts among Chinese Han populations].[中国汉族人群中WNT信号通路基因单倍型与非综合征性口腔颌面部裂隙的关联研究]
Beijing Da Xue Xue Bao Yi Xue Ban. 2022 Jun 18;54(3):394-399. doi: 10.19723/j.issn.1671-167X.2022.03.002.
2
Haplotype and Haplotype-Environment Interaction Analysis Revealed Roles of SPRY2 for NSCL/P among Chinese Populations.单体型和单体型-环境互作分析揭示 SPRY2 在中国人群中非小细胞肺癌中的作用。
Int J Environ Res Public Health. 2019 Feb 15;16(4):557. doi: 10.3390/ijerph16040557.
3
Genomic variants, genes, and pathways of Alzheimer's disease: An overview.
阿尔茨海默病的基因组变异、基因及信号通路概述
Am J Med Genet B Neuropsychiatr Genet. 2017 Jan;174(1):5-26. doi: 10.1002/ajmg.b.32499.
4
Gene-wide analysis detects two new susceptibility genes for Alzheimer's disease.全基因分析检测到两个新的阿尔茨海默病易感基因。
PLoS One. 2014 Jun 12;9(6):e94661. doi: 10.1371/journal.pone.0094661. eCollection 2014.
5
Windfalls and pitfalls: Applications of population genetics to the search for disease genes.意外收获与陷阱:群体遗传学在疾病基因搜索中的应用。
Evol Med Public Health. 2013 Jan;2013(1):254-72. doi: 10.1093/emph/eot021. Epub 2013 Nov 6.
6
Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder.ZNF804A 的精细定位及全基因组范围内的证据表明其与精神分裂症和双相情感障碍有关。
Mol Psychiatry. 2011 Apr;16(4):429-41. doi: 10.1038/mp.2010.36. Epub 2010 Apr 6.
7
Transferability and fine-mapping of genome-wide associated loci for adult height across human populations.全基因组关联研究中身高相关位点在人群间的可转移性和精细定位。
PLoS One. 2009 Dec 22;4(12):e8398. doi: 10.1371/journal.pone.0008398.
8
Neuregulin 1 and age of onset in the major psychoses.神经调节蛋白1与主要精神疾病的发病年龄
J Neural Transm (Vienna). 2009 Apr;116(4):479-86. doi: 10.1007/s00702-008-0182-9. Epub 2009 Jan 28.
9
Is replication the gold standard for validating genome-wide association findings?复制是验证全基因组关联研究结果的金标准吗?
PLoS One. 2008;3(12):e4037. doi: 10.1371/journal.pone.0004037. Epub 2008 Dec 29.
10
Gene-wide analyses of genome-wide association data sets: evidence for multiple common risk alleles for schizophrenia and bipolar disorder and for overlap in genetic risk.全基因组关联数据集的基因层面分析:精神分裂症和双相情感障碍存在多个常见风险等位基因及遗传风险重叠的证据。
Mol Psychiatry. 2009 Mar;14(3):252-60. doi: 10.1038/mp.2008.133. Epub 2008 Dec 9.