Hopke P K, Liu C, Rubin D B
Department of Chemistry, Clarkson University, Potsdam, New York 13699, USA.
Biometrics. 2001 Mar;57(1):22-33. doi: 10.1111/j.0006-341x.2001.00022.x.
Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets.
许多化学和环境数据集因存在完全缺失值或已知低于检测阈值的截尾值而变得复杂。例如,1980年至1991年期间在加拿大西北地区的阿勒特采集了为期一周的空气传播颗粒物样本,其中24种颗粒物成分的一些浓度在完全缺失或低于检测限的意义上被粗略化了。为便于科学分析,通过填补缺失值来创建完整数据很有吸引力,这样就可以应用标准的完整数据方法。我们简要回顾了处理缺失值的常用策略,并重点关注多重填补方法,当面对缺失数据时,该方法通常能得出有效的推断。开发了三种统计模型来多重填补空气传播颗粒物的缺失值。我们期望这些模型可用于在各种不完整的多元时间序列数据集中创建多重填补。