Department of Medical Sciences, University of Turin and CPO-Piemonte, Turin, Italy
Medical Statistics, London School of Hygiene & Tropical Medicine, London, UK.
J Epidemiol Community Health. 2019 May;73(5):475-480. doi: 10.1136/jech-2018-211829. Epub 2019 Feb 25.
There is debate as to whether cohort studies are valid when they are based on a source population that is non-representative of a given general population. This baseline selection may introduce collider bias if the exposure of interest and some other outcome risk factors affect the probability of being in the source population, thus altering the associations between the exposure and those risk factors. We argue that this mechanism is not specific to 'selected cohorts' and also occurs in 'representative cohorts' due to the selection processes that occur in any population. These selection processes are for example linked to the life status, immigration and emigration, which, in turn, may be affected by environmental and social determinants, lifestyles and genetics. We provide real-world examples of this phenomenon using data on the population of the Piedmont region, Italy. In addition to well-recognised mechanisms, such as shared common causes, the associations between the exposure of interest and the risk factors for the outcome of interest in any source population are potentially shaped by collider bias due to the underlying selection processes. We conclude that, when conducting a cohort study, different source populations, whether 'selected' or 'representative', may lead to different exposure-outcome risk factor associations, and thus different degrees of lack of exchangeability, but that one approach is not inherently more or less biased than the other. The key issue is whether the relevant risk factors can be identified and controlled.
关于基于非代表性源人群的队列研究是否有效的问题一直存在争议。如果感兴趣的暴露因素和其他一些结局风险因素会影响进入源人群的概率,从而改变暴露因素与这些风险因素之间的关联,那么这种基线选择可能会引入混杂偏倚。我们认为,这种机制不仅存在于“选择队列”中,而且由于任何人群中都会发生的选择过程,也存在于“代表性队列”中。这些选择过程例如与生活状况、移民和移民有关,而这些因素反过来又可能受到环境和社会决定因素、生活方式和遗传因素的影响。我们使用意大利皮埃蒙特地区的人口数据提供了这种现象的真实世界示例。除了众所周知的共同原因外,由于潜在的选择过程,在任何源人群中,感兴趣的暴露因素与感兴趣的结局的风险因素之间的关联都可能受到混杂偏倚的影响。我们的结论是,在进行队列研究时,不同的源人群(无论是“选择”还是“代表性”)可能会导致不同的暴露-结局风险因素关联,从而导致不同程度的不可交换性,但一种方法并不比另一种方法固有地更有偏见或偏见更少。关键问题是是否可以识别和控制相关的风险因素。