He Hua, Wang Wenjuan, Crits-Christoph Paul, Gallop Robert, Tang Wan, Chen Ding-Geng Din, Tu Xin M
University of Rochester Medical Center.
University of Pennsylvania.
J Data Sci. 2014 Jul;12(3):439-460.
In alcohol studies, drinking outcomes such as number of days of any alcohol drinking (DAD) over a period of time do not precisely capture the differences among subjects in a study population of interest. For example, the value of 0 on DAD could mean that the subject was continually abstinent from drinking such as lifetime abstainers or the subject was alcoholic, but happened not to use any alcohol during the period of interest. In statistics, zeros of the first kind are called structural zeros, to distinguish them from the sampling zeros of the second type. As the example indicates, the structural and sampling zeros represent two groups of subjects with quite different psychosocial outcomes. In the literature on alcohol use, although many recent studies have begun to explicitly account for the differences between the two types of zeros in modeling drinking variables as a response, none has acknowledged the implications of the different types of zeros when such modeling drinking variables are used as a predictor. This paper serves as the first attempt to tackle the latter issue and illustrate the importance of disentangling the structural and sampling zeros by using simulated as well as real study data.
在酒精研究中,诸如一段时间内任何饮酒天数(DAD)等饮酒结果并不能精确反映感兴趣的研究人群中个体之间的差异。例如,DAD值为0可能意味着该受试者持续戒酒,如终生戒酒者,或者该受试者是酗酒者,但在感兴趣的时间段内恰好未饮酒。在统计学中,第一种类型的零被称为结构零,以区别于第二种类型的抽样零。如该示例所示,结构零和抽样零代表两组具有截然不同心理社会结果的受试者。在酒精使用的文献中,尽管最近许多研究已开始在将饮酒变量作为反应变量进行建模时明确考虑这两种类型零之间的差异,但在将此类饮酒变量用作预测变量时,尚无研究承认不同类型零的影响。本文首次尝试解决后一个问题,并通过使用模拟数据和实际研究数据来说明区分结构零和抽样零的重要性。