Lim Sooyeol, Beyene Joseph, Greenwood Celia M T
The Hospital for Sick Children Research Institute.
Stat Appl Genet Mol Biol. 2005;4:Article20. doi: 10.2202/1544-6115.1140. Epub 2005 Aug 15.
We propose a multinomial logistic regression method which permits estimation and likelihood ratio tests for allele effects, their interactions with continuous covariates, and assessment of the degree of population stratification in genetic association studies of case-parent triads. Our approach overcomes the constraint imposed by the categorical nature of explanatory variables in the log-linear model. We also demonstrate that the multinomial logistic method can yield efficient inference in the presence of missing parental genotype data via the use of the Expectation-Maximization (EM) algorithm. We performed simulations to compare the multinomial logistic model with the case-pseudosibling conditional logistic model approach, both of which permit the incorporation of continuous covariates. Simulation results indicate that the multinomial logistic model and the conditional logistic model lead to similar estimates in large samples. A simulation-based method of sample size estimation is also used to show that the two models are approximately equivalent in sample size requirements. When parental genotype data are missing, either completely at random or dependent on covariates, the use of the EM algorithm gives multinomial logistic model greater power. Since the multinomial logistic model offers the possibility of assessing the degree of population stratification in the sample and can also provide efficient inference in the presence of missing parental genotypes, the proposed model has an important application in epidemiological family-based association studies.
我们提出了一种多项逻辑回归方法,该方法可用于估计病例-父母三联体遗传关联研究中的等位基因效应、它们与连续协变量的相互作用,并评估群体分层程度,同时还能进行似然比检验。我们的方法克服了对数线性模型中解释变量的分类性质所带来的限制。我们还证明,通过使用期望最大化(EM)算法,多项逻辑回归方法在存在缺失亲本基因型数据的情况下能够进行有效推断。我们进行了模拟,以比较多项逻辑回归模型与病例-假同胞条件逻辑回归模型方法,这两种方法都允许纳入连续协变量。模拟结果表明,在大样本中,多项逻辑回归模型和条件逻辑回归模型得出的估计结果相似。一种基于模拟的样本量估计方法也被用于表明这两种模型在样本量要求方面大致相当。当亲本基因型数据缺失时,无论是完全随机缺失还是依赖于协变量,使用EM算法会使多项逻辑回归模型具有更大的功效。由于多项逻辑回归模型提供了评估样本中群体分层程度的可能性,并且在存在缺失亲本基因型的情况下也能提供有效推断,因此所提出的模型在基于家庭的流行病学关联研究中具有重要应用。