Department of Mathematics, Physics, and Statistics, University of the Sciences, Philadelphia, Pennsylvania 19104, USA.
Genet Epidemiol. 2011 Jul;35(5):291-302. doi: 10.1002/gepi.20577. Epub 2011 Apr 4.
Understanding and modeling genetic or nongenetic factors that influence susceptibility to complex traits has been the focus of many genetic studies. Large pedigrees with known complex structure may be advantageous in epidemiological studies since they can significantly increase the number of factors whose influence on the trait can be estimated. We propose a likelihood approach, developed in the context of generalized linear mixed models, for modeling dichotomous traits based on data from hundreds of individuals all of whom are potentially correlated through either a known pedigree or an estimated covariance matrix. Our approach is based on a hierarchical model where we first assess the probability of each individual having the trait and then formulate a likelihood assuming conditional independence of individuals. The advantage of our formulation is that it easily incorporates information from pertinent covariates as fixed effects and at the same time takes into account the correlation between individuals that share genetic background or other random effects. The high dimensionality of the integration involved in the likelihood prohibits exact computations. Instead, an automated Monte Carlo expectation maximization algorithm is employed for obtaining the maximum likelihood estimates of the model parameters. Through a simulation study we demonstrate that our method can provide reliable estimates of the model parameters when the sample size is close to 500. Implementation of our method to data from a pedigree of 491 Hutterites evaluated for Type 2 diabetes (T2D) reveal evidence of a strong genetic component to T2D risk, particularly for younger and leaner cases.
理解和建模影响复杂性状易感性的遗传或非遗传因素一直是许多遗传研究的重点。具有已知复杂结构的大型家系在流行病学研究中可能具有优势,因为它们可以显著增加可以估计其对性状影响的因素数量。我们提出了一种似然方法,该方法是在广义线性混合模型的背景下开发的,用于基于来自数百个人的数据建模二分类性状,这些人都可能通过已知的家系或估计的协方差矩阵相互关联。我们的方法基于一个层次模型,首先评估每个个体具有该性状的概率,然后根据个体的条件独立性来构建似然。我们的方法的优势在于,它可以轻松地将来自相关协变量的信息作为固定效应纳入,并同时考虑具有遗传背景或其他随机效应的个体之间的相关性。似然中涉及的高维积分使得精确计算变得困难。相反,采用自动蒙特卡罗期望最大化算法来获得模型参数的最大似然估计。通过模拟研究,我们证明了当样本量接近 500 时,我们的方法可以为模型参数提供可靠的估计。我们的方法在对 491 名 Hutterites 进行 2 型糖尿病 (T2D) 评估的数据中的实施表明,T2D 风险存在很强的遗传成分,尤其是对于年轻和瘦的病例。