Li Z, Gastwirth J L, Gail M H
Department of Statistics, George Washington University, 2201 G Street NW., Washington, DC 20052, USA.
Ann Hum Genet. 2005 May;69(Pt 3):296-314. doi: 10.1046/j.1529-8817.2005.00169.x.
Both population based and family based case control studies are used to test whether particular genotypes are associated with disease. While population based studies have more power, cryptic population stratification can produce false-positive results. Family-based methods have been introduced to control for this problem. This paper presents the full likelihood function for family-based association studies for nuclear families ascertained on the basis of their number of affected and unaffected children. The likelihood of a family factors into the probability of parental mating type, conditional on offspring phenotypes, times the probability of offspring genotypes given their phenotypes and the parental mating type. The first factor can be influenced by population stratification, whereas the latter factor, called the conditional likelihood, is not. The conditional likelihood is used to obtain score tests with proper size in the presence of population stratification (see also Clayton (1999) and Whittemore & Tu (2000)). Under either the additive or multiplicative model, the TDT is known to be the optimal score test when the family has only one affected child. Thus, the class of score tests explored can be considered as a general family of TDT-like procedures. The relative informativeness of the various mating types is assessed using the Fisher information, which depends on the number of affected and unaffected offspring and the penetrances. When the additive model is true, families with parental mating type Aa x Aa are most informative. Under the dominant (recessive) model, however, a family with mating type Aa x aa(AA x Aa) is more informative than a family with doubly heterozygous (Aa x Aa) parents. Because we derive explicit formulae for all components of the likelihood, we are able to present tables giving required sample sizes for dominant, additive and recessive inheritance models.
基于人群和基于家系的病例对照研究均用于检验特定基因型是否与疾病相关。虽然基于人群的研究效力更强,但隐秘的人群分层可能会产生假阳性结果。基于家系的方法已被引入以控制这一问题。本文给出了基于家系的关联研究的完整似然函数,该研究针对核心家庭,根据其患病和未患病子女的数量来确定。一个家庭的似然性可分解为基于后代表型的亲代交配类型概率,乘以给定其表型和亲代交配类型时后代基因型的概率。第一个因素可能受人群分层影响,而后者,即条件似然性,则不受影响。在存在人群分层的情况下,条件似然性用于获得具有适当检验水准的计分检验(另见Clayton(1999年)和Whittemore与Tu(2000年))。在加性或乘性模型下,当家庭只有一个患病子女时,传递不平衡检验(TDT)是最优的计分检验。因此,所探讨的计分检验类别可被视为一类类似TDT的通用程序。使用费希尔信息评估各种交配类型的相对信息量,费希尔信息取决于患病和未患病后代的数量以及外显率。当加性模型成立时,亲代交配类型为Aa×Aa的家庭信息量最大。然而,在显性(隐性)模型下,交配类型为Aa×aa(AA×Aa)的家庭比双亲均为双杂合子(Aa×Aa)的家庭信息量更大。由于我们推导了似然性所有组成部分的显式公式,所以能够给出显性、加性和隐性遗传模型所需样本量的表格。