基因因素病例对照研究对人群分层的稳健性：偏差程度与I型错误

Robustness of case-control studies of genetic factors to population stratification: magnitude of bias and type I error.

作者信息

Khlat Myriam, Cazes Marie-Hélène, Génin Emmanuelle, Guiguet Marguerite

机构信息

Institut National d'Etudes Démographiques, 133 Boulevard Davout, 75980 Paris Cedex 20, France.

出版信息

Cancer Epidemiol Biomarkers Prev. 2004 Oct;13(10):1660-4.

PMID:15466984

Abstract

Case-control studies of genetic factors are prone to a special form of confounding called population stratification, whenever the existence of one or more subpopulations may lead to a false association, be it positive or negative. We quantify both the bias (in terms of confounding risk ratio) and the probability of false association (type I error) in the most unfavorable situation in which only one high-risk subpopulation is hidden within the studied population, considering different scenarios of population structuring and varying sample sizes. In accord with previous work, we find that the bias is likely to be small in most cases. In addition, we show that the same applies to the associated type I error whenever the subpopulation is small in proportion. For instance, when the hidden subpopulation makes up 5% of the entire population, with an allelic frequency of 0.25 (versus 0.10) and a disease rate that is double, then the estimated bias is 1.07 and the type I error associated with a sample of 500 cases and 500 controls is 8% (instead of 5%). We also show that the type I error is substantially greater for a rare allele (frequency of 0.1) than for a common allele (frequency of 0.5) and analyze the pattern of increase of vulnerability to stratification bias with sample size. Based on our findings, we may therefore conclude that with moderate sample sizes the type I error associated with population stratification remains very limited in most realistic scenarios.

摘要

对遗传因素进行病例对照研究时，只要存在一个或多个亚群可能导致错误关联（无论是正相关还是负相关），就容易出现一种特殊形式的混杂，即群体分层。我们在最不利的情况下，即研究人群中仅隐藏一个高危亚群时，考虑不同的群体结构情况和变化的样本量，对偏差（以混杂风险比衡量）和错误关联的概率（I型错误）进行了量化。与之前的研究一致，我们发现大多数情况下偏差可能较小。此外，我们还表明，当亚群比例较小时，与之相关的I型错误情况也是如此。例如，当隐藏亚群占整个人口的5%，等位基因频率为0.25（对比0.10）且疾病发生率翻倍时，估计偏差为1.07，对于500例病例和500例对照的样本，与之相关的I型错误为8%（而非5%）。我们还表明，罕见等位基因（频率为0.1）的I型错误比常见等位基因（频率为0.5）的I型错误大得多，并分析了样本量增加时对分层偏差易感性的增加模式。基于我们的研究结果，因此我们可以得出结论，在中等样本量的情况下，在大多数实际情况下，与群体分层相关的I型错误仍然非常有限。