Kipnis Victor, Freedman Laurence S, Carroll Raymond J, Midthune Douglas
Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, Maryland.
Information Management Services, Inc., Rockville, Maryland and Gertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer, Israel.
Biometrics. 2016 Mar;72(1):106-15. doi: 10.1111/biom.12377. Epub 2015 Aug 31.
Semicontinuous data in the form of a mixture of a large portion of zero values and continuously distributed positive values frequently arise in many areas of biostatistics. This article is motivated by the analysis of relationships between disease outcomes and intakes of episodically consumed dietary components. An important aspect of studies in nutritional epidemiology is that true diet is unobservable and commonly evaluated by food frequency questionnaires with substantial measurement error. Following the regression calibration approach for measurement error correction, unknown individual intakes in the risk model are replaced by their conditional expectations given mismeasured intakes and other model covariates. Those regression calibration predictors are estimated using short-term unbiased reference measurements in a calibration substudy. Since dietary intakes are often "energy-adjusted," e.g., by using ratios of the intake of interest to total energy intake, the correct estimation of the regression calibration predictor for each energy-adjusted episodically consumed dietary component requires modeling short-term reference measurements of the component (a semicontinuous variable), and energy (a continuous variable) simultaneously in a bivariate model. In this article, we develop such a bivariate model, together with its application to regression calibration. We illustrate the new methodology using data from the NIH-AARP Diet and Health Study (Schatzkin et al., 2001, American Journal of Epidemiology 154, 1119-1125), and also evaluate its performance in a simulation study.
在生物统计学的许多领域中,经常会出现大量零值与连续分布的正值混合形式的半连续数据。本文的动机源于对疾病结局与偶尔摄入的膳食成分之间关系的分析。营养流行病学研究的一个重要方面是,真实饮食是不可观察的,通常通过存在大量测量误差的食物频率问卷来评估。遵循测量误差校正的回归校准方法,风险模型中未知的个体摄入量被其在测量错误的摄入量和其他模型协变量条件下的条件期望所取代。这些回归校准预测因子是在校准子研究中使用短期无偏参考测量来估计的。由于膳食摄入量通常是“能量调整的”,例如通过使用感兴趣的摄入量与总能量摄入量的比率,对于每个经能量调整的偶尔摄入的膳食成分,回归校准预测因子的正确估计需要在双变量模型中同时对该成分(一个半连续变量)和能量(一个连续变量)的短期参考测量进行建模。在本文中,我们开发了这样一个双变量模型及其在回归校准中的应用。我们使用来自美国国立卫生研究院 - 美国退休人员协会饮食与健康研究(Schatzkin等人,2001年,《美国流行病学杂志》154卷,第1119 - 1125页)的数据说明了新方法,并在模拟研究中评估了其性能。