Department of Biostatistical Sciences, Division of Public Health Sciences, Wake Forest University School of Medicine, Winston-Salem, North Carolina 27157, USA.
Environ Health Perspect. 2011 Mar;119(3):351-6. doi: 10.1289/ehp.1002124. Epub 2010 Nov 19.
Environmental and biomedical researchers frequently encounter laboratory data constrained by a lower limit of detection (LOD). Commonly used methods to address these left-censored data, such as simple substitution of a constant for all values < LOD, may bias parameter estimation. In contrast, multiple imputation (MI) methods yield valid and robust parameter estimates and explicit imputed values for variables that can be analyzed as outcomes or predictors.
In this article we expand distribution-based MI methods for left-censored data to a bivariate setting, specifically, a longitudinal study with biological measures at two points in time.
We have presented the likelihood function for a bivariate normal distribution taking into account values < LOD as well as missing data assumed missing at random, and we use the estimated distributional parameters to impute values < LOD and to generate multiple plausible data sets for analysis by standard statistical methods. We conducted a simulation study to evaluate the sampling properties of the estimators, and we illustrate a practical application using data from the Community Participatory Approach to Measuring Farmworker Pesticide Exposure (PACE3) study to estimate associations between urinary acephate (APE) concentrations (indicating pesticide exposure) at two points in time and self-reported symptoms.
Simulation study results demonstrated that imputed and observed values together were consistent with the assumed and estimated underlying distribution. Our analysis of PACE3 data using MI to impute APE values < LOD showed that urinary APE concentration was significantly associated with potential pesticide poisoning symptoms. Results based on simple substitution methods were substantially different from those based on the MI method.
The distribution-based MI method is a valid and feasible approach to analyze bivariate data with values < LOD, especially when explicit values for the nondetections are needed. We recommend the use of this approach in environmental and biomedical research.
环境和生物医学研究人员经常遇到受检测下限(LOD)限制的实验室数据。常用的处理这些左截断数据的方法,例如将所有 < LOD 的值简单替换为一个常数,可能会导致参数估计偏倚。相比之下,多重插补(MI)方法可以为可以作为结果或预测变量进行分析的变量生成有效且稳健的参数估计值和明确的插补值。
本文将基于分布的 MI 方法扩展到双变量设置,特别是在两个时间点进行的纵向研究中存在生物测量值的情况。
我们提出了考虑到 < LOD 值以及假设随机缺失的缺失数据的双变量正态分布的似然函数,并使用估计的分布参数来插补 < LOD 值并生成多个合理的数据集,以便通过标准统计方法进行分析。我们进行了一项模拟研究来评估估计量的抽样性质,并使用来自社区参与式测量农场工人农药暴露(PACE3)研究的数据来说明实际应用,以估计两个时间点的尿中乙酰甲胺磷(APE)浓度(表示农药暴露)与自我报告症状之间的关联。
模拟研究结果表明,插补值和观测值与假定和估计的基础分布一致。我们使用 MI 对 PACE3 数据进行分析以插补 < LOD 的 APE 值,结果表明尿中 APE 浓度与潜在的农药中毒症状显著相关。基于简单替换方法的结果与基于 MI 方法的结果有很大差异。
基于分布的 MI 方法是分析存在 < LOD 值的双变量数据的有效且可行的方法,特别是当需要明确表示非检测值时。我们建议在环境和生物医学研究中使用这种方法。