Lorenz Eva, Jenkner Carolin, Sauerbrei Willi, Becher Heiko
Medical University of Innsbruck, Department of Internal Medicine V, Anichstraße 35, 6020, Innsbruck, Austria.
Clinical Trial Unit (C.J.), Freiburg University Medical Center, Freiburg, Germany.
Am J Epidemiol. 2017 Apr 15;185(8):650-660. doi: 10.1093/aje/kww122.
In most epidemiologic studies and in clinical research generally, there are variables with a spike at zero, namely variables for which a proportion of individuals have zero exposure (e.g., never smokers) and among those exposed the variable has a continuous distribution. Different options exist for modeling such variables, such as categorization where the nonexposed form the reference group, or ignoring the spike by including the variable in the regression model with or without some transformation or modeling procedures. It has been shown that such situations can be analyzed by adding a binary indicator (exposed/nonexposed) to the regression model, and a method based on fractional polynomials with which to estimate a suitable functional form for the positive portion of the spike-at-zero variable distribution has been developed. In this paper, we compare different approaches using data from 3 case-control studies carried out in Germany: the Mammary Carcinoma Risk Factor Investigation (MARIE), a breast cancer study conducted in 2002-2005 (Flesch-Janys et al., Int J Cancer. 2008;123(4):933-941); the Rhein-Neckar Larynx Study, a study of laryngeal cancer conducted in 1998-2000 (Dietz et al., Int J Cancer. 2004;108(6):907-911); and a lung cancer study conducted in 1988-1993 (Jöckel et al., Int J Epidemiol. 1998;27(4):549-560). Strengths and limitations of different procedures are demonstrated, and some recommendations for practical use are given.
在大多数流行病学研究以及一般的临床研究中,存在一些在零处有尖峰的变量,即有一部分个体暴露为零的变量(例如,从不吸烟者),并且在那些暴露者中该变量具有连续分布。对于这类变量的建模存在不同的选择,比如将未暴露者作为参照组进行分类,或者通过在回归模型中纳入该变量(无论是否进行某种变换或建模程序)来忽略尖峰。已经表明,这种情况可以通过在回归模型中添加一个二元指标(暴露/未暴露)来分析,并且已经开发出一种基于分数多项式的方法,用于估计零处有尖峰变量分布的正部分的合适函数形式。在本文中,我们使用在德国进行的3项病例对照研究的数据来比较不同的方法:乳腺癌风险因素调查(MARIE),这是一项在2002 - 2005年进行的乳腺癌研究(Flesch-Janys等人,《国际癌症杂志》。2008年;123(4):933 - 941);莱茵-内卡喉癌研究,这是一项在1998 - 2000年进行的喉癌研究(Dietz等人,《国际癌症杂志》。2004年;108(6):907 - 911);以及一项在1988 - 1993年进行的肺癌研究(Jöckel等人,《国际流行病学杂志》。1998年;27(4):549 - 560)。展示了不同程序的优点和局限性,并给出了一些实际应用的建议。