Norwegian Computing Center, Oslo, Norway.
Department of Method Development and Analytics, Norwegian Institute of Public Health, Oslo, Norway.
BMC Med Res Methodol. 2022 May 20;22(1):146. doi: 10.1186/s12874-022-01565-1.
Regression models are often used to explain the relative risk of infectious diseases among groups. For example, overrepresentation of immigrants among COVID-19 cases has been found in multiple countries. Several studies apply regression models to investigate whether different risk factors can explain this overrepresentation among immigrants without considering dependence between the cases.
We study the appropriateness of traditional statistical regression methods for identifying risk factors for infectious diseases, by a simulation study. We model infectious disease spread by a simple, population-structured version of an SIR (susceptible-infected-recovered)-model, which is one of the most famous and well-established models for infectious disease spread. The population is thus divided into different sub-groups. We vary the contact structure between the sub-groups of the population. We analyse the relation between individual-level risk of infection and group-level relative risk. We analyse whether Poisson regression estimators can capture the true, underlying parameters of transmission. We assess both the quantitative and qualitative accuracy of the estimated regression coefficients.
We illustrate that there is no clear relationship between differences in individual characteristics and group-level overrepresentation -small differences on the individual level can result in arbitrarily high overrepresentation. We demonstrate that individual risk of infection cannot be properly defined without simultaneous specification of the infection level of the population. We argue that the estimated regression coefficients are not interpretable and show that it is not possible to adjust for other variables by standard regression methods. Finally, we illustrate that regression models can result in the significance of variables unrelated to infection risk in the constructed simulation example (e.g. ethnicity), particularly when a large proportion of contacts is within the same group.
Traditional regression models which are valid for modelling risk between groups for non-communicable diseases are not valid for infectious diseases. By applying such methods to identify risk factors of infectious diseases, one risks ending up with wrong conclusions. Output from such analyses should therefore be treated with great caution.
回归模型常用于解释群体间传染病的相对风险。例如,在多个国家发现 COVID-19 病例中移民人数过多。一些研究应用回归模型来调查不同的风险因素是否可以解释移民中的这种过度代表,而不考虑病例之间的依赖性。
我们通过模拟研究研究了传统统计回归方法在识别传染病风险因素方面的适用性。我们通过简单的、具有人口结构的 SIR(易感-感染-恢复)模型来模拟传染病的传播,这是最著名和最成熟的传染病传播模型之一。因此,人口被分为不同的亚群。我们改变人口亚群之间的接触结构。我们分析个体感染风险与群体相对风险之间的关系。我们分析泊松回归估计量是否可以捕获真实的、潜在的传播参数。我们评估估计回归系数的定量和定性准确性。
我们说明了个体特征差异与群体代表性之间没有明确的关系-个体水平上的微小差异可能导致任意高的代表性。我们证明,没有同时指定人群的感染水平,就不可能正确定义个体感染风险。我们认为,估计的回归系数是不可解释的,并且表明无法通过标准回归方法调整其他变量。最后,我们说明了回归模型可能导致构建的模拟示例中与感染风险无关的变量(例如种族)的显著性,特别是当大部分接触发生在同一群体内时。
传统回归模型对于建模非传染性疾病组间风险是有效的,但不适用于传染病。通过应用这些方法来识别传染病的风险因素,可能会得出错误的结论。因此,应谨慎对待此类分析的结果。