Suppr超能文献

人工智能时代初期的新冠疫情与流行病学模型认识论

COVID-19 and the epistemology of epidemiological models at the dawn of AI.

作者信息

Ellison George T H

机构信息

Centre for Data Innovation, Faculty of Science and Technology, University of Central Lancashire, Preston, UK.

出版信息

Ann Hum Biol. 2020 Sep;47(6):506-513. doi: 10.1080/03014460.2020.1839132.

Abstract

The models used to estimate disease transmission, susceptibility and severity determine what epidemiology can (and cannot tell) us about COVID-19. These include: 'model organisms' chosen for their phylogenetic/aetiological similarities; multivariable statistical models to estimate the strength/direction of (potentially causal) between variables (through 'causal inference'), and the (past/future) of unmeasured variables (through 'classification/prediction'); and a range of modelling techniques to predict beyond the available data (through 'extrapolation'), compare different hypothetical scenarios (through 'simulation'), and estimate key features of dynamic processes (through 'projection'). Each of these models: address different questions using different techniques; involve assumptions that require careful assessment; and are vulnerable to generic and specific biases that can undermine the validity interpretation of their findings. It is therefore necessary that the models used: can actually address the questions posed; have been competently applied. In this regard, it is important to stress that extrapolation, simulation and projection cannot offer accurate predictions of future events when the underlying mechanisms (and the contexts involved) are poorly understood and subject to change. Given the importance of understanding such mechanisms/contexts, and the limited opportunity for experimentation during outbreaks of novel diseases, the use of multivariable statistical models to estimate the strength/direction of potentially causal relationships between two variables (and the biases incurred through their misapplication/misinterpretation) warrant particular attention. Such models must be carefully designed to address: 'selection-collider bias', 'unadjusted confounding bias' and 'inferential mediator adjustment bias' - all of which can introduce effects capable of enhancing, masking or reversing the estimated (true) causal relationship between the two variables examined.1 Selection-collider bias occurs when these two variables independently cause a third (the 'collider'), and when this collider determines/reflects the basis for selection in the analysis. It is likely to affect all incompletely representative samples, although its effects will be most pronounced wherever selection is constrained (e.g. analyses focusing on infected/hospitalised individuals). Unadjusted confounding bias disrupts the estimated (true) causal relationship between two variables when: these share one (or more) common cause(s); when the effects of these causes have been adjusted for in the analyses (e.g. whenever confounders are unknown/unmeasured). Inferentially similar biases can occur when: one (or more) variable(s) (or 'mediators') fall on the causal path between the two variables examined (i.e. when such mediators are one of the variables and are the other); when these mediators adjusted for in the analysis. Such adjustment is commonplace when: mediators are mistaken for confounders; prediction models are mistakenly repurposed for causal inference; or mediator adjustment is used to estimate direct and indirect causal relationships (in a mistaken attempt at 'mediation analysis'). These three biases are central to ongoing and unresolved epistemological tensions within epidemiology. All have substantive implications for our understanding of COVID-19, and the future application of artificial intelligence to 'data-driven' modelling of similar phenomena. Nonetheless, competently applied and carefully interpreted, multivariable statistical models may yet provide sufficient insight into mechanisms and contexts to permit more accurate projections of future disease outbreaks.

摘要

用于估计疾病传播、易感性和严重性的模型决定了流行病学能(以及不能)告诉我们关于新冠病毒的哪些信息。这些模型包括:因其系统发育/病因学相似性而选择的“模式生物”;用于估计变量之间(潜在因果)关系的强度/方向(通过“因果推断”)以及未测量变量的(过去/未来)情况(通过“分类/预测”)的多变量统计模型;以及一系列用于在现有数据之外进行预测(通过“外推”)、比较不同假设情景(通过“模拟”)和估计动态过程关键特征(通过“投影”)的建模技术。这些模型中的每一个:使用不同技术解决不同问题;涉及需要仔细评估的假设;并且容易受到可能破坏其研究结果有效性和解释的一般和特定偏差的影响。因此,所使用的模型必须:能够实际解决所提出的问题;并且已经得到妥善应用。在这方面,必须强调的是,当潜在机制(以及所涉及的背景)理解不足且可能发生变化时,外推、模拟和投影无法对未来事件提供准确预测。鉴于理解此类机制/背景的重要性,以及在新型疾病爆发期间进行实验的机会有限,使用多变量统计模型来估计两个变量之间潜在因果关系的强度/方向(以及因错误应用/错误解释而产生的偏差)值得特别关注。此类模型必须经过精心设计以解决:“选择 - 对撞机偏差”、“未调整的混杂偏差”和“推断性中介调整偏差”——所有这些偏差都可能引入能够增强、掩盖或逆转所研究的两个变量之间估计的(真实)因果关系的效应。1当这两个变量独立导致第三个变量(“对撞机”),并且该对撞机决定/反映分析中的选择基础时,就会出现选择 - 对撞机偏差。它可能会影响所有代表性不完整的样本,尽管其影响在选择受到限制的情况下(例如专注于感染/住院个体的分析)最为明显。当两个变量共享一个(或多个)共同原因,并且这些原因的效应在分析中未得到调整时(例如当混杂因素未知/未测量时),未调整的混杂偏差会干扰所估计的(真实)因果关系。当一个(或多个)变量(或“中介变量”)位于所研究的两个变量之间的因果路径上时(即当此类中介变量是其中一个变量而不是另一个变量时),并且这些中介变量在分析中未得到调整时,可能会出现推断性类似偏差。当出现以下情况时,这种调整很常见:中介变量被误认为混杂因素;预测模型被错误地用于因果推断;或者中介调整被用于估计直接和间接因果关系(在错误尝试进行“中介分析”时)。这三种偏差是流行病学中持续存在且未解决的认识论紧张关系的核心。所有这些偏差对我们对新冠病毒的理解以及人工智能在类似现象的“数据驱动”建模中的未来应用都具有实质性影响。尽管如此,如果应用得当且解释仔细,多变量统计模型仍可能为机制和背景提供足够的见解,以便对未来疾病爆发进行更准确的预测。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验