截尾数据集变异性和不确定性的量化及其在空气有毒排放因子中的应用。

Quantification of variability and uncertainty for censored data sets and application to air toxic emission factors.

作者信息

Zhao Yuchao, Frey H Christopher

机构信息

Department of Civil, Construction, and Environmental Engineering, North Carolina State University Raleigh, NC, USA.

出版信息

Risk Anal. 2004 Aug;24(4):1019-34. doi: 10.1111/j.0272-4332.2004.00504.x.

DOI:10.1111/j.0272-4332.2004.00504.x

PMID:15357825

Abstract

Many environmental data sets, such as for air toxic emission factors, contain several values reported only as below detection limit. Such data sets are referred to as "censored." Typical approaches to dealing with the censored data sets include replacing censored values with arbitrary values of zero, one-half of the detection limit, or the detection limit. Here, an approach to quantification of the variability and uncertainty of censored data sets is demonstrated. Empirical bootstrap simulation is used to simulate censored bootstrap samples from the original data. Maximum likelihood estimation (MLE) is used to fit parametric probability distributions to each bootstrap sample, thereby specifying alternative estimates of the unknown population distribution of the censored data sets. Sampling distributions for uncertainty in statistics such as the mean, median, and percentile are calculated. The robustness of the method was tested by application to different degrees of censoring, sample sizes, coefficients of variation, and numbers of detection limits. Lognormal, gamma, and Weibull distributions were evaluated. The reliability of using this method to estimate the mean is evaluated by averaging the best estimated means of 20 cases for small sample size of 20. The confidence intervals for distribution percentiles estimated with bootstrap/MLE method compared favorably to results obtained with the nonparametric Kaplan-Meier method. The bootstrap/MLE method is illustrated via an application to an empirical air toxic emission factor data set.

摘要

许多环境数据集，例如空气有毒排放因子数据集，包含几个仅报告为低于检测限的值。这样的数据集被称为“删失数据”。处理删失数据集的典型方法包括用任意值（零、检测限的一半或检测限）替换删失值。在此，展示了一种量化删失数据集变异性和不确定性的方法。经验自助模拟用于从原始数据模拟删失自助样本。最大似然估计（MLE）用于将参数概率分布拟合到每个自助样本，从而指定删失数据集未知总体分布的替代估计。计算统计量（如均值、中位数和百分位数）不确定性的抽样分布。通过应用于不同程度的删失、样本大小、变异系数和检测限数量来测试该方法的稳健性。评估了对数正态分布、伽马分布和威布尔分布。通过对小样本量为20的20个案例的最佳估计均值求平均，评估使用该方法估计均值的可靠性。用自助/MLE方法估计的分布百分位数的置信区间与用非参数卡普兰-迈耶方法获得的结果相比更具优势。通过应用于一个经验空气有毒排放因子数据集来说明自助/MLE方法。