Wang Hao, Heitjan Daniel F
Department of Biostatistics & Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA.
Stat Med. 2008 Aug 30;27(19):3789-804. doi: 10.1002/sim.3281.
In studies of smoking behavior, some subjects report exact cigarette counts, whereas others report rounded-off counts, particularly multiples of 20, 10 or 5. This form of data reporting error, known as heaping, can bias the estimation of parameters of interest such as mean cigarette consumption. We present a model to describe heaped count data from a randomized trial of bupropion treatment for smoking cessation. The model posits that the reported cigarette count is a deterministic function of an underlying precise cigarette count variable and a heaping behavior variable, both of which are at best partially observed. To account for an excess of zeros, as would likely occur in a smoking cessation study where some subjects successfully quit, we model the underlying count variable with zero-inflated count distributions. We study the sensitivity of the inference on smoking cessation by fitting various models that either do or do not account for heaping and zero inflation, comparing the models by means of Bayes factors. Our results suggest that sufficiently rich models for both the underlying distribution and the heaping behavior are indispensable to obtaining a good fit with heaped smoking data. The analyses moreover reveal that bupropion has a significant effect on the fraction abstinent, but not on mean cigarette consumption among the non-abstinent.
在吸烟行为研究中,一些受试者报告准确的香烟数量,而另一些受试者报告的是四舍五入后的数量,尤其是20、10或5的倍数。这种数据报告误差形式,即所谓的堆积,可能会使诸如平均香烟消费量等感兴趣参数的估计产生偏差。我们提出了一个模型,用于描述来自安非他酮戒烟随机试验的堆积计数数据。该模型假定报告的香烟数量是一个潜在精确香烟数量变量和一个堆积行为变量的确定性函数,这两个变量至多只能部分观察到。为了解释零值过多的情况,这在一些受试者成功戒烟的戒烟研究中很可能会出现,我们用零膨胀计数分布对潜在计数变量进行建模。我们通过拟合各种考虑或不考虑堆积和零膨胀的模型来研究戒烟推断的敏感性,并通过贝叶斯因子比较这些模型。我们的结果表明,对于潜在分布和堆积行为,足够丰富的模型对于与堆积吸烟数据取得良好拟合是必不可少的。此外,分析还表明安非他酮对戒烟率有显著影响,但对未戒烟者的平均香烟消费量没有影响。