From the Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI.
Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI.
Epidemiology. 2019 Sep;30(5):746-755. doi: 10.1097/EDE.0000000000001052.
Limit of detection (LOD) issues are ubiquitous in exposure assessment. Although there is an extensive literature on modeling exposure data under such imperfect measurement processes, including likelihood-based methods and multiple imputation, the standard practice continues to be naïve single imputation by a constant (e.g., (Equation is included in full-text article.)). In this article, we consider the situation where, due to the practical logistics of data accrual, sampling, and resource constraints, exposure data are analyzed in multiple batches where the LOD and the proportion of censored observations differ across batches. Compounding this problem is the potential for nonrandom assignment of samples to each batch, often driven by enrollment patterns and biosample storage. This issue is particularly important for binary outcome data where batches may have different levels of outcome enrichment. We first consider variants of existing methods to address varying LODs across multiple batches. We then propose a likelihood-based multiple imputation strategy to impute observations that are below the LOD while simultaneously accounting for differential batch assignment. Our simulation study shows that our proposed method has superior estimation properties (i.e., bias, coverage, statistical efficiency) compared to standard alternatives, provided that distributional assumptions are satisfied. Additionally, in most batch assignment configurations, complete-case analysis can be made unbiased by including batch indicator terms in the analysis model, although this strategy is less efficient relative to the proposed method. We illustrate our method by analyzing data from a cohort study in Puerto Rico that is investigating the relation between endocrine disruptor exposures and preterm birth.
检测限 (LOD) 问题在暴露评估中普遍存在。尽管有大量关于在这种不完美测量过程下建模暴露数据的文献,包括基于似然的方法和多重插补,但标准做法仍然是通过常数进行简单的单一插补(例如,(方程包含在全文中))。在本文中,我们考虑了由于数据积累、采样和资源限制的实际后勤问题,暴露数据在多个批次中进行分析的情况,其中每个批次的 LOD 和被删失观察的比例都不同。使这个问题更加复杂的是,样本被非随机分配到每个批次的可能性,这通常是由入组模式和生物样本储存驱动的。对于二项结局数据,这个问题尤其重要,因为批次可能具有不同的结局富集水平。我们首先考虑了现有方法的变体,以解决多个批次中 LOD 不同的问题。然后,我们提出了一种基于似然的多重插补策略,在对低于 LOD 的观察进行插补的同时,同时考虑到批次分配的差异。我们的模拟研究表明,我们提出的方法具有优越的估计特性(即偏差、覆盖率、统计效率),与标准替代方法相比,前提是满足分布假设。此外,在大多数批次分配配置中,通过在分析模型中包含批次指示项,可以使完全案例分析无偏,尽管与提出的方法相比,这种策略效率较低。我们通过分析波多黎各一项队列研究的数据来说明我们的方法,该研究正在调查内分泌干扰物暴露与早产之间的关系。