Harel Ofer, Perkins Neil, Schisterman Enrique F
Department of Statistics, University of Connecticut, USA.
Epidemiology Branch, Eunice Kennedy Shriver National Institute for Child and Human Development, Rockvile, MD, USA.
Sri Lankan J Appl Stat. 2014;5(4):227-246. doi: 10.4038/sljastats.v5i4.7792. Epub 2014 Dec 15.
Missing data due to limit of detection and limit of quantification is a common obstacle in epidemiological and biomedical research. We are interested in methodologies that provide unbiased and efficient estimates of these missing data while using popular statistical software. We describe a multiple imputation (MI) procedure for cross-sectional and longitudinal data which examines the sources of variation of hormones levels throughout the menstrual cycle conditional on specific biomarkers. We describe the rational, procedure, advantages and disadvantages of the multiple imputation procedure. We also provide a comparison to commonly used missing data procedures (complete cases analysis and single imputation). We illustrate our approach using the BioCycle data where we are interested in the effects of Vitamin E and Beta-carotene on Progesterone levels. We also evaluate the longitudinal impact of changes in Vitamin E on Progesterone levels over time. Finaly, we demonstrate the advantages of using MI over complete case analysis or naive single replacement in both cross-sectional and longitudinal analysis where measurements below the limit of quantification (LOQ) are unreported. We also illustrate that if available, inclusion of potentially demined unreliable data below the limit of detection (LOD) improves simple estimation substantially.
由于检测限和定量限导致的数据缺失是流行病学和生物医学研究中的常见障碍。我们对在使用流行统计软件的同时能提供这些缺失数据无偏且有效估计的方法感兴趣。我们描述了一种针对横断面和纵向数据的多重填补(MI)程序,该程序基于特定生物标志物来研究整个月经周期中激素水平变化的来源。我们描述了多重填补程序的原理、步骤、优缺点。我们还与常用的缺失数据程序(完整病例分析和单一填补)进行了比较。我们使用BioCycle数据来说明我们的方法,在该数据中我们关注维生素E和β-胡萝卜素对孕酮水平的影响。我们还评估了维生素E变化对孕酮水平随时间的纵向影响。最后,我们证明了在横断面和纵向分析中,当低于定量限(LOQ)的测量值未报告时,使用多重填补比完整病例分析或简单单一替换的优势。我们还说明了如果可行,纳入低于检测限(LOD)的潜在不可靠数据会显著改善简单估计。