MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.
Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
Int J Epidemiol. 2023 Feb 8;52(1):44-57. doi: 10.1093/ije/dyac221.
Non-random selection of analytic subsamples could introduce selection bias in observational studies. We explored the potential presence and impact of selection in studies of SARS-CoV-2 infection and COVID-19 prognosis.
We tested the association of a broad range of characteristics with selection into COVID-19 analytic subsamples in the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK Biobank (UKB). We then conducted empirical analyses and simulations to explore the potential presence, direction and magnitude of bias due to this selection (relative to our defined UK-based adult target populations) when estimating the association of body mass index (BMI) with SARS-CoV-2 infection and death-with-COVID-19.
In both cohorts, a broad range of characteristics was related to selection, sometimes in opposite directions (e.g. more-educated people were more likely to have data on SARS-CoV-2 infection in ALSPAC, but less likely in UKB). Higher BMI was associated with higher odds of SARS-CoV-2 infection and death-with-COVID-19. We found non-negligible bias in many simulated scenarios.
Analyses using COVID-19 self-reported or national registry data may be biased due to selection. The magnitude and direction of this bias depend on the outcome definition, the true effect of the risk factor and the assumed selection mechanism; these are likely to differ between studies with different target populations. Bias due to sample selection is a key concern in COVID-19 research based on national registry data, especially as countries end free mass testing. The framework we have used can be applied by other researchers assessing the extent to which their results may be biased for their research question of interest.
在观察性研究中,分析子样本的非随机选择可能会引入选择偏倚。我们探讨了 SARS-CoV-2 感染和 COVID-19 预后研究中选择的存在和影响。
我们在阿冯纵向研究父母和孩子(ALSPAC)和英国生物库(UKB)中测试了广泛的特征与 COVID-19 分析子样本选择的关联。然后,我们进行了实证分析和模拟,以探讨由于这种选择(相对于我们定义的基于英国的成年目标人群)而导致的偏差的存在、方向和程度,当估计体重指数(BMI)与 SARS-CoV-2 感染和 COVID-19 死亡的关联时。
在两个队列中,广泛的特征与选择有关,有时方向相反(例如,在 ALSPAC 中,受教育程度较高的人更有可能有 SARS-CoV-2 感染的数据,但在 UKB 中则不太可能)。较高的 BMI 与 SARS-CoV-2 感染和 COVID-19 死亡的几率较高有关。我们发现,在许多模拟场景中存在不可忽视的偏差。
使用 COVID-19 自我报告或国家登记数据的分析可能会因选择而存在偏差。这种偏差的大小和方向取决于结局定义、风险因素的真实效应和假设的选择机制;这些可能因具有不同目标人群的研究而异。由于样本选择导致的偏差是基于国家登记数据的 COVID-19 研究中的一个关键问题,特别是随着各国结束免费大规模检测。我们使用的框架可以被其他研究人员用来评估他们的研究结果是否可能对他们感兴趣的研究问题存在偏差。