Kato Tsuyoshi, Miura Takayuki, Okabe Satoshi, Sano Daisuke
Department of Computer Science, Graduate School of Engineering, Gunma University, Tenjinmachi 1-5-1, Kiryu, Gunma, 376-8515, Japan.
Food Environ Virol. 2013 Aug 25. doi: 10.1007/s12560-013-9125-1.
Stochastic models are used to express pathogen density in environmental samples for performing microbial risk assessment with quantitative uncertainty. However, enteric virus density in water often falls below the quantification limit (non-detect) of the analytical methods employed, and it is always difficult to apply stochastic models to a dataset with a substantially high number of non-detects, i.e., left-censored data. We applied a Bayesian model that is able to model both the detected data (detects) and non-detects to simulated left-censored datasets of enteric virus density in wastewater. One hundred paired datasets were generated for each of the 39 combinations of a sample size and the number of detects, in which three sample sizes (12, 24, and 48) and the number of detects from 1 to 12, 24 and 48 were employed. The simulated observation data were assigned to one of two groups, i.e., detects and non-detects, by setting values on the limit of quantification to obtain the assumed number of detects for creating censored datasets. Then, the Bayesian model was applied to the censored datasets, and the estimated mean and standard deviation were compared to the true values by root mean square deviation. The difference between the true distribution and posterior predictive distribution was evaluated by Kullback-Leibler (KL) divergence, and it was found that the estimation accuracy was strongly affected by the number of detects. It is difficult to describe universal criteria to decide which level of accuracy is enough, but eight or more detects are required to accurately estimate the posterior predictive distributions when the sample size is 12, 24, or 48. The posterior predictive distribution of virus removal efficiency with a wastewater treatment unit process was obtained as the log ratio posterior distributions between the posterior predictive distributions of enteric viruses in untreated wastewater and treated wastewater. The KL divergence between the true distribution and posterior predictive distribution of virus removal efficiency also depends on the number of detects, and eight or more detects in a dataset of treated wastewater are required for its accurate estimation.
随机模型用于表示环境样本中的病原体密度,以便在存在定量不确定性的情况下进行微生物风险评估。然而,水中肠道病毒密度常常低于所采用分析方法的定量限(未检出),并且将随机模型应用于大量未检出数据(即左删失数据)的数据集始终存在困难。我们应用了一种贝叶斯模型,该模型能够对检测到的数据(检出值)和未检出数据进行建模,以处理模拟的废水肠道病毒密度左删失数据集。针对样本量和检出数的39种组合中的每一种,生成了100对数据集,其中采用了三种样本量(12、24和48)以及从1到12、24和48的检出数。通过在定量限上设置值,将模拟观测数据分配到两个组之一,即检出值和未检出值,以获得用于创建删失数据集的假定检出数。然后,将贝叶斯模型应用于删失数据集,并通过均方根偏差将估计的均值和标准差与真实值进行比较。通过Kullback-Leibler(KL)散度评估真实分布与后验预测分布之间的差异,发现估计精度受检出数的强烈影响。很难描述决定何种精度水平足够的通用标准,但当样本量为12、24或48时,准确估计后验预测分布需要八个或更多的检出数。通过未处理废水和处理后废水的肠道病毒后验预测分布之间的对数比后验分布,获得了废水处理单元过程的病毒去除效率的后验预测分布。病毒去除效率的真实分布与后验预测分布之间的KL散度也取决于检出数,并且处理后废水的数据集中需要八个或更多的检出数才能对其进行准确估计。