Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts 02115, USA.
Genet Epidemiol. 2009 Dec;33(8):717-28. doi: 10.1002/gepi.20424.
Genome-wide association studies (GWAS) require considerable investment, so researchers often study multiple traits collected on the same set of subjects to maximize return. However, many GWAS have adopted a case-control design; improperly accounting for case-control ascertainment can lead to biased estimates of association between markers and secondary traits. We show that under the null hypothesis of no marker-secondary trait association, naïve analyses that ignore ascertainment or stratify on case-control status have proper Type I error rates except when both the marker and secondary trait are independently associated with disease risk. Under the alternative hypothesis, these methods are unbiased when the secondary trait is not associated with disease risk. We also show that inverse-probability-of-sampling-weighted (IPW) regression provides unbiased estimates of marker-secondary trait association. We use simulation to quantify the Type I error, power and bias of naïve and IPW methods. IPW regression has appropriate Type I error in all situations we consider, but has lower power than naïve analyses. The bias for naïve analyses is small provided the marker is independent of disease risk. Considering the majority of tested markers in a GWAS are not associated with disease risk, naïve analyses provide valid tests of and nearly unbiased estimates of marker-secondary trait association. Care must be taken when there is evidence that both the secondary trait and tested marker are associated with the primary disease, a situation we illustrate using an analysis of the relationship between a marker in FGFR2 and mammographic density in a breast cancer case-control sample.
全基因组关联研究(GWAS)需要大量的投资,因此研究人员经常研究同一组受试者的多种特征,以最大限度地提高回报。然而,许多 GWAS 采用了病例对照设计;不正确地考虑病例对照的确定可能导致标记物与次要特征之间关联的有偏估计。我们表明,在没有标记物-次要特征关联的零假设下,忽略确定或对病例对照状态分层的幼稚分析具有适当的Ⅰ型错误率,除非标记物和次要特征都独立与疾病风险相关。在替代假设下,当次要特征与疾病风险无关时,这些方法是无偏的。我们还表明,反抽样概率加权(IPW)回归提供了标记物-次要特征关联的无偏估计。我们使用模拟来量化幼稚和 IPW 方法的Ⅰ型错误、功效和偏差。在我们考虑的所有情况下,IPW 回归都具有适当的Ⅰ型错误,但功效低于幼稚分析。只要标记物独立于疾病风险,幼稚分析的偏差就很小。考虑到 GWAS 中大多数测试的标记物都与疾病风险无关,幼稚分析提供了对标记物-次要特征关联的有效检验和几乎无偏的估计。当有证据表明次要特征和测试标记物都与主要疾病相关时,必须谨慎,我们使用在乳腺癌病例对照样本中 FGFR2 中的标记物与乳房 X 光密度之间的关系分析来说明这种情况。