Biswas Swati, Lin Shili
Department of Statistics, Ohio State University, Columbus, Ohio 43210, USA.
Genet Epidemiol. 2004 Apr;26(3):206-17. doi: 10.1002/gepi.10314.
Locus heterogeneity is a major problem plaguing the mapping of disease genes responsible for complex genetic traits via linkage analysis. A common feature of several available methods to account for heterogeneity is that they involve maximizing a multidimensional likelihood to obtain maximum likelihood estimates. The high dimensionality of the likelihood surface may be due to multiple heterogeneity (mixing) parameters, linkage parameters, and/or regression coefficients corresponding to multiple covariates. Here, we focus on this nontrivial computational aspect of incorporating heterogeneity by considering several likelihood maximization procedures, including the expectation maximization (EM) algorithm and the stochastic expectation maximization (SEM) algorithm. The wide applicability of these procedures is demonstrated first through a general formulation of accounting for heterogeneity, and then by applying them to two specific formulations. Furthermore, our simulation studies as well as an application to the Genetic Analysis Workshop 12 asthma datasets show that, among other observations, SEM performs better than EM. As an aside, we illustrate a limitation of the popular admixture approach for incorporating heterogeneity, proved elsewhere. We also show how to obtain standard errors (SEs) for EM and SEM estimates, using methods available in the literature. These SEs can then be combined with the corresponding estimates to provide confidence intervals of the parameters.
基因座异质性是困扰通过连锁分析来定位负责复杂遗传性状的疾病基因的一个主要问题。几种可用的解释异质性的方法的一个共同特征是,它们都涉及最大化一个多维似然函数以获得最大似然估计。似然曲面的高维度可能是由于多个异质性(混合)参数、连锁参数和/或对应于多个协变量的回归系数。在这里,我们通过考虑几种似然最大化程序,包括期望最大化(EM)算法和随机期望最大化(SEM)算法,来关注纳入异质性这一重要的计算方面。这些程序的广泛适用性首先通过解释异质性的一般公式得到证明,然后通过将它们应用于两个具体公式来证明。此外,我们的模拟研究以及对遗传分析研讨会12哮喘数据集的应用表明,在其他观察结果中,SEM比EM表现更好。顺便提一下,我们说明了在其他地方已证明的流行的混合方法在纳入异质性方面的一个局限性。我们还展示了如何使用文献中可用的方法来获得EM和SEM估计的标准误差(SE)。然后可以将这些SE与相应的估计相结合,以提供参数的置信区间。