Siegert Sabine, Wolf Andreas, Cooper David N, Krawczak Michael, Nothnagel Michael
Cologne Center for Genomics, University of Cologne, Cologne, Germany; Institute of Epidemiology, Christian-Albrechts University, Kiel, Germany.
Institute of Medical Informatics and Statistics, Christian-Albrechts University, Kiel, Germany.
PLoS One. 2015 Jul 10;10(7):e0132150. doi: 10.1371/journal.pone.0132150. eCollection 2015.
Guided by the practice of classical epidemiology, research into the genetic basis of complex disease has usually taken for granted the dictum that causative mutations are invariably over-represented among clinically affected as compared to unaffected individuals. However, we show that this supposition is not true and that a mutation contributing to the etiology of a complex disease can, under certain circumstances, be depleted among patients. Populations with defined disease prevalence were repeatedly simulated under a Wright-Fisher model, assuming various types of population history and genotype-phenotype relationship. For each simulation, the resulting mutation-specific population frequencies and odds ratios (ORs) were evaluated. In addition, the relationship between mutation frequency and OR was studied using real data from the NIH GWAS catalogue of reported phenotype associations of single-nucleotide polymorphisms (SNPs). While rare diseases (prevalence <1%) were found to be consistently caused by rare mutations with ORs>1, up to 20% of mutations causing a pandemic disease (prevalence 10-20%) had ORs<1, and their population frequency ranged from 0% to 100%. Moreover, simulation-based ORs exhibited a wide distribution, irrespective of mutation frequency. In conclusion, a substantial proportion of mutations causing common complex diseases may appear 'protective' in genetic epidemiological studies and hence would normally tend to be excluded, albeit erroneously, from further study. This apparently paradoxical result is explicable in terms of mutual confounding of the respective genotype-phenotype relationships due to a negative correlation between causal mutations induced by their common gene genealogy. As would be predicted by our findings, a significant negative correlation became apparent in published genome-wide association studies between the OR of genetic variants associated with a particular disease and the prevalence of that disease.
在经典流行病学实践的指导下,对复杂疾病遗传基础的研究通常理所当然地认为,与未受影响的个体相比,致病突变在临床受影响个体中总是过度富集。然而,我们发现这种假设并不正确,在某些情况下,导致复杂疾病病因的突变在患者中可能会减少。在赖特 - 费希尔模型下,反复模拟具有特定疾病患病率的人群,假设各种类型的群体历史和基因型 - 表型关系。对于每次模拟,评估产生的特定突变的群体频率和优势比(OR)。此外,使用来自美国国立卫生研究院全基因组关联研究(GWAS)目录中报道的单核苷酸多态性(SNP)表型关联的真实数据,研究了突变频率与OR之间的关系。虽然发现罕见疾病(患病率<1%)始终由OR>1的罕见突变引起,但高达20%的导致大流行疾病(患病率10 - 20%)的突变OR<1,其群体频率范围为0%至100%。此外,基于模拟的OR呈现出广泛的分布,与突变频率无关。总之,在遗传流行病学研究中,相当一部分导致常见复杂疾病的突变可能显得具有“保护性”,因此通常会被错误地排除在进一步研究之外。这一明显矛盾的结果可以通过共同基因谱系诱导的因果突变之间的负相关导致各自基因型 - 表型关系的相互混淆来解释。正如我们的研究结果所预测的那样,在已发表的全基因组关联研究中,与特定疾病相关的遗传变异的OR与该疾病的患病率之间出现了显著的负相关。