Tapsoba Jean de Dieu, Kooperberg Charles, Reiner Alexander, Wang Ching-Yun, Dai James Y
Am J Epidemiol. 2014 May 15;179(10):1264-72. doi: 10.1093/aje/kwu039. Epub 2014 Apr 9.
Secondary trait genetic association provides insight into the genetic architecture of disease etiology but requires caution in estimation. Ignoring case-control sampling may introduce bias into secondary trait association. In this paper, we compare the efficiency and robustness of various inverse probability weighted (IPW) estimators and maximum likelihood (ML) estimators. ML methods have been proposed but require correct modeling of both the secondary and the primary trait associations for valid inference. We show that ML methods using a misspecified primary trait model can severely inflate the type I error. IPW estimators are typically less efficient than ML estimators but are robust against model misspecification. When the secondary trait is available for the entire cohort, the IPW estimator with selection probabilities estimated nonparametrically and the augmented IPW estimator improve efficiency over the simple IPW estimator. We conclude that in large genetic association studies with complex sampling schemes, IPW-based estimators offer flexibility and robustness, and therefore are a viable option for analysis.
次要性状基因关联为疾病病因的遗传结构提供了见解,但在估计时需要谨慎。忽略病例对照抽样可能会给次要性状关联带来偏差。在本文中,我们比较了各种逆概率加权(IPW)估计器和最大似然(ML)估计器的效率和稳健性。已经提出了ML方法,但为了进行有效的推断,需要对次要性状和主要性状关联进行正确建模。我们表明,使用错误指定的主要性状模型的ML方法会严重夸大I型错误。IPW估计器通常比ML估计器效率低,但对模型错误指定具有稳健性。当次要性状适用于整个队列时,非参数估计选择概率的IPW估计器和增强IPW估计器比简单IPW估计器提高了效率。我们得出结论,在具有复杂抽样方案的大型基因关联研究中,基于IPW的估计器具有灵活性和稳健性,因此是一种可行的分析选择。