Mel & Enid Zuckerman College of Public Health, The University of Arizona, Tucson, Arizona, USA.
Mel & Enid Zuckerman College of Public Health, The University of Arizona, Tucson, Arizona, USA
Appl Environ Microbiol. 2018 Oct 1;84(20). doi: 10.1128/AEM.01203-18. Print 2018 Oct 15.
Data below detection limits, left-censored data, are common in environmental microbiology, and decisions in handling censored data may have implications for quantitative microbial risk assessment (QMRA). In this paper, we utilize simulated data sets informed by real-world enterovirus water data to evaluate methods for handling left-censored data. Data sets were simulated with four censoring degrees (low [10%], medium [35%], high [65%], and severe [90%]) and one real-life censoring example (97%) and were informed by enterovirus data assuming a lognormal distribution with a limit of detection (LOD) of 2.3 genome copies/liter. For each data set, five methods for handling left-censored data were applied: (i) substitution with LOD/[Formula: see text], (ii) lognormal maximum likelihood estimation (MLE) to estimate mean and standard deviation, (iii) Kaplan-Meier estimation (KM), (iv) imputation method using MLE to estimate distribution parameters (MI method 1), and (v) imputation from a uniform distribution (MI method 2). Each data set mean was used to estimate enterovirus dose and infection risk. Root mean square error (RMSE) and bias were used to compare estimated and known doses and infection risks. MI method 1 resulted in the lowest dose and infection risk RMSE and bias ranges for most censoring degrees, predicting infection risks at most 1.17 × 10 from known values under 97% censoring. MI method 2 was the next overall best method. For medium to severe censoring, MI method 1 may result in the least error. If unsure of the distribution, MI method 2 may be a preferred method to avoid distribution misspecification. This study evaluates methods for handling data with low (10%) to severe (90%) left-censoring within an environmental microbiology context and demonstrates that some of these methods may be appropriate when using data containing concentrations below a limit of detection to estimate infection risks. Additionally, this study uses a skewed data set, which is an issue typically faced by environmental microbiologists.
数据低于检测限,即左删失数据,在环境微生物学中很常见,处理删失数据的决策可能会对定量微生物风险评估(QMRA)产生影响。在本文中,我们利用基于实际肠道病毒水质数据的模拟数据集来评估处理左删失数据的方法。数据集模拟了四种删失程度(低[10%]、中[35%]、高[65%]和严重[90%])和一个实际的删失示例(97%),并根据假设检测限(LOD)为 2.3 基因组拷贝/升的对数正态分布的肠道病毒数据进行了信息提供。对于每个数据集,我们应用了五种处理左删失数据的方法:(i)用 LOD/[公式:见正文]替代,(ii)对数正态最大似然估计(MLE)来估计均值和标准差,(iii)Kaplan-Meier 估计(KM),(iv)使用 MLE 估计分布参数的插补方法(MI 方法 1),以及(v)从均匀分布中进行插补(MI 方法 2)。每个数据集的均值用于估计肠道病毒剂量和感染风险。均方根误差(RMSE)和偏差用于比较估计值和已知剂量和感染风险。对于大多数删失程度,MI 方法 1 导致剂量和感染风险 RMSE 和偏差范围最小,在 97%的删失下,预测感染风险最高可达已知值的 1.17×10。MI 方法 2 是整体上第二好的方法。对于中度到严重的删失,MI 方法 1 可能导致最小的误差。如果不确定分布情况,MI 方法 2 可能是避免分布误判的首选方法。本研究评估了在环境微生物学背景下处理低(10%)到严重(90%)左删失数据的方法,并表明在使用低于检测限的浓度数据来估计感染风险时,这些方法中的某些方法可能是合适的。此外,本研究还使用了一个偏态数据集,这是环境微生物学家通常面临的一个问题。