Tilling Kate, Williamson Elizabeth J, Spratt Michael, Sterne Jonathan A C, Carpenter James R
School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK.
Department of Medical Statistics, London School of Hygiene and Tropical Medicine, University of London, Keppel Street, London WC1E 7HT, UK; Farr Institute of Health Informatics, London University College London, 222 Euston Road, London NW1 2DA, UK.
J Clin Epidemiol. 2016 Dec;80:107-115. doi: 10.1016/j.jclinepi.2016.07.004. Epub 2016 Jul 19.
Missing data are a pervasive problem, often leading to bias in complete records analysis (CRA). Multiple imputation (MI) via chained equations is one solution, but its use in the presence of interactions is not straightforward.
We simulated data with outcome Y dependent on binary explanatory variables X and Z and their interaction XZ. Six scenarios were simulated (Y continuous and binary, each with no interaction, a weak and a strong interaction), under five missing data mechanisms. We use directed acyclic graphs to identify when CRA and MI would each be unbiased. We evaluate the performance of CRA, MI without interactions, MI including all interactions, and stratified imputation. We also illustrated these methods using a simple example from the National Child Development Study (NCDS).
MI excluding interactions is invalid and resulted in biased estimates and low coverage. When XZ was zero, MI excluding interactions gave unbiased estimates but overcoverage. MI including interactions and stratified MI gave equivalent, valid inference in all cases. In the NCDS example, MI excluding interactions incorrectly concluded there was no evidence for an important interaction.
Epidemiologists carrying out MI should ensure that their imputation model(s) are compatible with their analysis model.
缺失数据是一个普遍存在的问题,常常导致完整记录分析(CRA)出现偏差。通过链式方程进行多重填补(MI)是一种解决方案,但其在存在交互作用的情况下的应用并不简单。
我们模拟了数据,其中结局Y取决于二元解释变量X和Z及其交互作用XZ。在五种缺失数据机制下模拟了六种情景(Y为连续型和二元型,每种情景下有无交互作用、弱交互作用和强交互作用)。我们使用有向无环图来确定何时CRA和MI各自无偏差。我们评估了CRA、无交互作用的MI、包含所有交互作用的MI以及分层填补的性能。我们还使用了来自全国儿童发展研究(NCDS)的一个简单例子来说明这些方法。
排除交互作用的MI无效,导致估计有偏差且覆盖度低。当XZ为零时,排除交互作用的MI给出无偏差估计,但覆盖度过高。包含交互作用的MI和分层MI在所有情况下都给出了等效的有效推断。在NCDS的例子中,排除交互作用的MI错误地得出没有证据支持重要交互作用的结论。
进行MI的流行病学家应确保其填补模型与分析模型兼容。