Yaghootkar Hanieh, Bancks Michael P, Jones Sam E, McDaid Aaron, Beaumont Robin, Donnelly Louise, Wood Andrew R, Campbell Archie, Tyrrell Jessica, Hocking Lynne J, Tuke Marcus A, Ruth Katherine S, Pearson Ewan R, Murray Anna, Freathy Rachel M, Munroe Patricia B, Hayward Caroline, Palmer Colin, Weedon Michael N, Pankow James S, Frayling Timothy M, Kutalik Zoltán
Genetics of Complex Traits, University of Exeter Medical School, University of Exeter, Exeter, UK.
Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, USA.
Hum Mol Genet. 2017 Mar 1;26(5):1018-1030. doi: 10.1093/hmg/ddw433.
As genetic association studies increase in size to 100 000s of individuals, subtle biases may influence conclusions. One possible bias is 'index event bias' (IEB) that appears due to the stratification by, or enrichment for, disease status when testing associations between genetic variants and a disease-associated trait. We aimed to test the extent to which IEB influences some known trait associations in a range of study designs and provide a statistical framework for assessing future associations. Analyzing data from 113 203 non-diabetic UK Biobank participants, we observed three (near TCF7L2, CDKN2AB and CDKAL1) overestimated (body mass index (BMI) decreasing) and one (near MTNR1B) underestimated (BMI increasing) associations among 11 type 2 diabetes risk alleles (at P < 0.05). IEB became even stronger when we tested a type 2 diabetes genetic risk score composed of these 11 variants (-0.010 standard deviations BMI per allele, P = 5 × 10- 4), which was confirmed in four additional independent studies. Similar results emerged when examining the effect of blood pressure increasing alleles on BMI in normotensive UK Biobank samples. Furthermore, we demonstrated that, under realistic scenarios, common disease alleles would become associated at P < 5 × 10- 8 with disease-related traits through IEB alone, if disease prevalence in the sample differs appreciably from the background population prevalence. For example, some hypertension and type 2 diabetes alleles will be associated with BMI in sample sizes of >500 000 if the prevalence of those diseases differs by >10% from the background population. In conclusion, IEB may result in false positive or negative genetic associations in very large studies stratified or strongly enriched for/against disease cases.
随着基因关联研究的样本量增加到数万人,一些细微的偏差可能会影响研究结论。一种可能的偏差是“索引事件偏差”(IEB),它在检测基因变异与疾病相关性状之间的关联时,由于按疾病状态进行分层或富集而出现。我们旨在测试IEB在一系列研究设计中对一些已知性状关联的影响程度,并提供一个评估未来关联的统计框架。通过分析来自113203名非糖尿病英国生物银行参与者的数据,我们在11个2型糖尿病风险等位基因中观察到3个(靠近TCF7L2、CDKN2AB和CDKAL1)关联被高估(体重指数(BMI)降低),1个(靠近MTNR1B)关联被低估(BMI增加)(P < 0.05)。当我们测试由这11个变异组成的2型糖尿病遗传风险评分时,IEB变得更强(每个等位基因导致BMI降低0.010标准差,P = 5×10−4),这在另外四项独立研究中得到了证实。在血压正常的英国生物银行样本中,研究血压升高等位基因对BMI的影响时也出现了类似结果。此外,我们证明,在现实情况下,如果样本中的疾病患病率与背景人群患病率有明显差异,仅通过IEB,常见疾病等位基因就会与疾病相关性状在P < 5×10−8水平上产生关联。例如,如果某些疾病的患病率与背景人群患病率相差超过10%,那么在样本量超过50万时,一些高血压和2型糖尿病等位基因将与BMI产生关联。总之,在针对疾病病例进行分层或强烈富集/排除的大型研究中,IEB可能会导致错误的阳性或阴性基因关联。