Raghunathan Trivellore, Ghosh Kaushik, Rosen Allison, Imbriano Paul, Stewart Susan, Bondarenko Irina, Messer Kassandra, Berglund Patricia, Shaffer James, Cutler David
Department of Biostatistics, 1415 Washington Heights, University of Michigan, Ann Arbor, MI 48109; Survey Research Center, Institute for Social Research, 426 Thompson Street, Ann Arbor, MI 48106.
National Bureau of Economic Research (NBER), 1050 Massachusetts Ave, Cambridge, MA 02138.
J Surv Stat Methodol. 2020 Jun;9(3):598-625. doi: 10.1093/jssam/smz047. Epub 2020 Mar 20.
Information about an extensive set of health conditions on a well-defined sample of subjects is essential for assessing population health, gauging the impact of various policies, modeling costs, and studying health disparities. Unfortunately, there is no single data source that provides accurate information about health conditions. We combine information from several administrative and survey data sets to obtain model-based dummy variables for 107 health conditions (diseases, preventive measures, and screening for diseases) for elderly (age 65 and older) subjects in the Medicare Current Beneficiary Survey (MCBS) over the fourteen-year period, 1999-2012. The MCBS has prevalence of diseases assessed based on Medicare claims and provides detailed information on all health conditions but is prone to underestimation bias. The National Health and Nutrition Examination Survey (NHANES), on the other hand, collects self-reports and physical/laboratory measures only for a subset of the 107 health conditions. Neither source provides complete information, but we use them together to derive model-based corrected dummy variables in MCBS for the full range of existing health conditions using a missing data and measurement error model framework. We create multiply imputed dummy variables and use them to construct the prevalence rate and trend estimates. The broader goal, however, is to use these corrected or modeled dummy variables for a multitude of policy analysis, cost modeling, and analysis of other relationships either using them as predictors or as outcome variables.
对于评估人群健康、衡量各项政策的影响、进行成本建模以及研究健康差异而言,在一个定义明确的样本主体中获取关于一系列广泛健康状况的信息至关重要。不幸的是,没有单一的数据源能提供有关健康状况的准确信息。我们将来自多个行政和调查数据集的信息相结合,以获取基于模型的虚拟变量,用于1999年至2012年这十四年期间医疗保险当前受益人调查(MCBS)中65岁及以上老年受试者的107种健康状况(疾病、预防措施和疾病筛查)。MCBS根据医疗保险理赔评估疾病患病率,并提供所有健康状况的详细信息,但容易出现低估偏差。另一方面,国家健康和营养检查调查(NHANES)仅针对107种健康状况中的一部分收集自我报告以及身体/实验室测量数据。这两个数据源都无法提供完整信息,但我们使用一个缺失数据和测量误差模型框架,将它们结合起来以得出MCBS中基于模型的、针对所有现有健康状况的校正虚拟变量。我们创建多重填补虚拟变量,并使用它们来构建患病率和趋势估计值。然而,更广泛的目标是将这些校正后的或基于模型的虚拟变量用于多种政策分析、成本建模以及其他关系分析,要么将它们用作预测变量,要么用作结果变量。