Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory, Cambridge, Institute for Medical Research, Cambridge University, United Kingdom.
Genet Epidemiol. 2012 May;36(4):409-18. doi: 10.1002/gepi.21635. Epub 2012 Apr 16.
"Complex" diseases are, by definition, influenced by multiple causes, both genetic and environmental, and statistical work on the joint action of multiple risk factors has, for more than 40 years, been dominated by the generalized linear model (GLM). In genetics, models for dichotomous traits have traditionally been approached via the model of an underlying, normally distributed, liability. This corresponds to the GLM with binomial errors and a probit link function. Elsewhere in epidemiology, however, the logistic regression model, a GLM with logit link function, has been the tool of choice, largely because of its convenient properties in case-control studies. The choice of link function has usually been dictated by mathematical convenience, but it has some important implications in (a) the choice of association test statistic in the presence of existing strong risk factors, (b) the ability to predict disease from genotype given its heritability, and (c) the definition, and interpretation of epistasis (or epistacy). These issues are reviewed, and a new association test proposed.
“复杂”疾病是由多种原因共同作用的,包括遗传和环境因素。40 多年来,对多个风险因素的共同作用的统计研究一直以广义线性模型(GLM)为主导。在遗传学中,二项性状的模型传统上是通过潜在的正态分布倾向来处理的。这与二项错误和概率单位链接函数的 GLM 相对应。然而,在流行病学的其他领域,逻辑回归模型,一种具有对数链接函数的 GLM,一直是首选工具,这主要是因为它在病例对照研究中的方便特性。链接函数的选择通常取决于数学上的便利性,但它在以下方面有一些重要的影响:(a)在存在现有强风险因素的情况下选择关联检验统计量,(b)给定遗传力从基因型预测疾病的能力,以及(c)定义和解释上位性(或上位性)。本文回顾了这些问题,并提出了一种新的关联检验方法。