Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK.
Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.
Stat Med. 2020 Jul 20;39(16):2197-2231. doi: 10.1002/sim.8532. Epub 2020 Apr 3.
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.
在流行病学中,变量的测量误差和分类错误经常发生,并且涉及对公共卫生很重要的变量。它们的存在会强烈影响涉及这些变量的统计分析结果。然而,研究人员通常没有注意到由于这种测量误差而导致的偏差。我们分两部分概述了发生的误差类型、它们对分析结果的影响以及用于减轻它们引起的偏差的统计方法。在这第一部分中,我们回顾了不同类型的测量误差和分类错误,重点介绍了经典、线性和 Berkson 模型,以及非差异和差异误差的概念。我们描述了这些类型的误差在协变量和结局变量中对各种分析的影响,包括回归模型中的估计和检验以及分布的估计。我们概述了需要提供有关此类误差信息的辅助研究类型,并讨论了协变量测量误差对研究设计的影响。概述了用于确定辅助研究和主要研究所需样本量的方法,这些研究旨在提供有关测量误差的信息,并且感兴趣的暴露是有误差的测量。我们描述了两种更简单的方法,回归校准和模拟外推(SIMEX),它们可以校正连续协变量测量误差引起的回归系数偏差,并通过从观察蛋白质和能量(OPEN)饮食验证研究中提取的示例说明其用法。最后,我们回顾了可用于实施这些方法的软件。文章的第二部分涉及更高级的主题。