National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD 20782, USA.
Stat Med. 2010 Feb 28;29(5):533-45. doi: 10.1002/sim.3809.
Common data sources for assessing the health of a population of interest include large-scale surveys based on interviews that often pose questions requiring a self-report, such as, 'Has a doctor or other health professional ever told you that you have health condition of interest?' or 'What is your (height/weight)?' Answers to such questions might not always reflect the true prevalences of health conditions (for example, if a respondent misreports height/weight or does not have access to a doctor or other health professional). Such 'measurement error' in health data could affect inferences about measures of health and health disparities. Drawing on two surveys conducted by the National Center for Health Statistics, this paper describes an imputation-based strategy for using clinical information from an examination-based health survey to improve on analyses of self-reported data in a larger interview-based health survey. Models predicting clinical values from self-reported values and covariates are fitted to data from the National Health and Nutrition Examination Survey (NHANES), which asks self-report questions during an interview component and also obtains clinical measurements during a physical examination component. The fitted models are used to multiply impute clinical values for the National Health Interview Survey (NHIS), a larger survey that obtains data solely via interviews. Illustrations involving hypertension, diabetes, and obesity suggest that estimates of health measures based on the multiply imputed clinical values are different from those based on the NHIS self-reported data alone and have smaller estimated standard errors than those based solely on the NHANES clinical data. The paper discusses the relationship of the methods used in the study to two-phase/two-stage/validation sampling and estimation, along with limitations, practical considerations, and areas for future research.
用于评估感兴趣人群健康状况的常见数据源包括基于访谈的大规模调查,这些调查通常会提出需要自我报告的问题,例如,“医生或其他健康专业人员是否曾告诉过您患有感兴趣的健康状况?”或“您的(身高/体重)是多少?” 对于这样的问题,答案可能并不总是反映健康状况的真实流行率(例如,如果受访者虚报身高/体重或无法获得医生或其他健康专业人员的帮助)。这种健康数据中的“测量误差”可能会影响对健康和健康差异衡量标准的推断。本文基于国家卫生统计中心进行的两项调查,描述了一种基于插补的策略,即利用基于体检的健康调查中的临床信息来改进基于访谈的大型健康调查中的自我报告数据的分析。通过自我报告值和协变量拟合预测临床值的模型,然后将这些模型应用于国家健康与营养调查(NHANES)的数据中,该调查在访谈部分中询问自我报告问题,并且在身体检查部分中获得临床测量值。拟合模型用于对国家健康访谈调查(NHIS)进行多重插补临床值,后者是一个更大的仅通过访谈获得数据的调查。涉及高血压、糖尿病和肥胖的说明表明,基于多重插补临床值的健康衡量标准估计值与仅基于 NHIS 自我报告数据的估计值不同,并且与仅基于 NHANES 临床数据的估计值相比,其估计标准误差更小。本文讨论了该研究中使用的方法与两阶段/两阶段/验证抽样和估计的关系,以及局限性、实际考虑因素和未来研究领域。