Minică Camelia C, Dolan Conor V, Kampert Maarten M D, Boomsma Dorret I, Vink Jacqueline M
Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
Mathematical Institute, Leiden University, Leiden, The Netherlands.
Eur J Hum Genet. 2015 Mar;23(3):388-94. doi: 10.1038/ejhg.2014.94. Epub 2014 Jun 11.
Given the availability of genotype and phenotype data collected in family members, the question arises which estimator ensures the most optimal use of such data in genome-wide scans. Using simulations, we compared the Unweighted Least Squares (ULS) and Maximum Likelihood (ML) procedures. The former is implemented in Plink and uses a sandwich correction to correct the standard errors for model misspecification of ignoring the clustering. The latter is implemented by fast linear mixed procedures and models explicitly the familial resemblance. However, as it commits to a background model limited to additive genetic and unshared environmental effects, it employs a misspecified model for traits with a shared environmental component. We considered the performance of the two procedures in terms of type I and type II error rates, with correct and incorrect model specification in ML. For traits characterized by moderate to large familial resemblance, using an ML procedure with a correctly specified model for the conditional familial covariance matrix should be the strategy of choice. The potential loss in power encountered by the sandwich corrected ULS procedure does not outweigh its computational convenience. Furthermore, the ML procedure was quite robust under model misspecification in the simulated settings and appreciably more powerful than the sandwich corrected ULS procedure. However, to correct for the effects of model misspecification in ML in circumstances other than those considered here, we propose to use a sandwich correction. We show that the sandwich correction can be formulated in terms of the fast ML method.
鉴于可以获取家庭成员的基因型和表型数据,就出现了一个问题:在全基因组扫描中,哪种估计方法能确保最优化地利用这些数据。通过模拟,我们比较了非加权最小二乘法(ULS)和最大似然法(ML)。前者在Plink中实现,使用三明治校正来校正因忽略聚类而导致的模型错误设定的标准误差。后者通过快速线性混合程序实现,并明确对家族相似性进行建模。然而,由于它采用的背景模型仅限于加性遗传效应和非共享环境效应,所以对于具有共享环境成分的性状,它采用了错误设定的模型。我们从I型和II型错误率的角度考虑了这两种方法的性能,其中ML存在正确和错误的模型设定情况。对于具有中度到高度家族相似性的性状,使用对条件家族协方差矩阵模型设定正确的ML方法应该是首选策略。三明治校正的ULS方法在功效上的潜在损失并不超过其计算便利性。此外,在模拟设置中,ML方法在模型错误设定的情况下相当稳健,并且比三明治校正的ULS方法更具功效。然而,为了校正此处未考虑的其他情况下ML中模型错误设定的影响,我们建议使用三明治校正。我们表明,三明治校正可以根据快速ML方法来制定。