Cao Zhiqiang, Wong Man Yu, Cheng Garvin Hl
College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China.
Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong, China.
Stat Methods Med Res. 2023 Apr;32(4):789-805. doi: 10.1177/09622802231154324. Epub 2023 Feb 15.
Many areas of research, such as nutritional epidemiology, may encounter measurement errors of continuous covariates and misclassification of categorical variables when modeling. It is well known that ignoring measurement errors or misclassification can lead to biased results. But most research has focused on solving these two problems separately. Addressing both measurement error and misclassification simultaneously in a single analysis is less actively studied. In this article, we propose a new correction method for a logistic regression to handle correlated error variables involved in multivariate continuous covariates and misclassification in a categorical variable simultaneously. It is not computationally intensive since a closed-form of the approximate likelihood function conditional on observed covariates is derived. The asymptotic normality of this proposed estimator is established under regularity conditions and its finite-sample performance is empirically examined by simulation studies. We apply this new estimation method to handle measurement error in some nutrients of interest and misclassification of a categorical variable named physical activity in the European Prospective Investigation into Cancer and Nutrition-InterAct Study data. Analyses show that fruit is negatively associated with type 2 diabetes for a group of women doing active physical activity, protein has positive association with type 2 diabetes for the group of less active physical activity, and actual physical activity has a greater effect on reducing the risk of type 2 diabetes than observed physical activity.
许多研究领域,如营养流行病学,在建模时可能会遇到连续协变量的测量误差和分类变量的错误分类问题。众所周知,忽略测量误差或错误分类会导致有偏差的结果。但大多数研究都分别聚焦于解决这两个问题。在单一分析中同时处理测量误差和错误分类的研究则较少。在本文中,我们提出了一种用于逻辑回归的新校正方法,以同时处理多变量连续协变量中涉及的相关误差变量和分类变量中的错误分类。由于推导了基于观测协变量的近似似然函数的闭式,所以计算量不大。在正则条件下建立了该估计量的渐近正态性,并通过模拟研究对其有限样本性能进行了实证检验。我们应用这种新的估计方法来处理欧洲癌症与营养前瞻性调查 - 交互作用研究数据中一些感兴趣营养素的测量误差以及一个名为身体活动的分类变量的错误分类。分析表明,对于一组进行积极身体活动的女性,水果与2型糖尿病呈负相关;对于身体活动较少的一组女性,蛋白质与2型糖尿病呈正相关;并且实际身体活动比观测到的身体活动对降低2型糖尿病风险的影响更大。