Am J Epidemiol. 2024 Jan 8;193(1):203-213. doi: 10.1093/aje/kwad178.
We developed and validated a claims-based algorithm that classifies patients into obesity categories. Using Medicare (2007-2017) and Medicaid (2000-2014) claims data linked to 2 electronic health record (EHR) systems in Boston, Massachusetts, we identified a cohort of patients with an EHR-based body mass index (BMI) measurement (calculated as weight (kg)/height (m)2). We used regularized regression to select from 137 variables and built generalized linear models to classify patients with BMIs of ≥25, ≥30, and ≥40. We developed the prediction model using EHR system 1 (training set) and validated it in EHR system 2 (validation set). The cohort contained 123,432 patients in the Medicare population and 40,736 patients in the Medicaid population. The model comprised 97 variables in the Medicare set and 95 in the Medicaid set, including BMI-related diagnosis codes, cardiovascular and antidiabetic drugs, and obesity-related comorbidities. The areas under the receiver-operating-characteristic curve in the validation set were 0.72, 0.75, and 0.83 (Medicare) and 0.66, 0.66, and 0.70 (Medicaid) for BMIs of ≥25, ≥30, and ≥40, respectively. The positive predictive values were 81.5%, 80.6%, and 64.7% (Medicare) and 81.6%, 77.5%, and 62.5% (Medicaid), for BMIs of ≥25, ≥30, and ≥40, respectively. The proposed model can identify obesity categories in claims databases when BMI measurements are missing and can be used for confounding adjustment, defining subgroups, or probabilistic bias analysis.
我们开发并验证了一种基于索赔的算法,该算法可将患者分类为肥胖类别。使用马萨诸塞州波士顿的两个电子健康记录 (EHR) 系统链接的医疗保险 (2007-2017) 和医疗补助 (2000-2014) 索赔数据,我们确定了一个基于 EHR 的体重指数 (BMI) 测量的患者队列 (体重 (kg)/身高 (m)2)。我们使用正则化回归从 137 个变量中进行选择,并构建广义线性模型来对 BMI≥25、≥30 和≥40 的患者进行分类。我们使用 EHR 系统 1(训练集)开发预测模型,并在 EHR 系统 2(验证集)中对其进行验证。该队列包含医疗保险人群中的 123432 名患者和医疗补助人群中的 40736 名患者。模型在医疗保险组中包含 97 个变量,在医疗补助组中包含 95 个变量,包括与 BMI 相关的诊断代码、心血管和抗糖尿病药物以及肥胖相关的合并症。验证集中的受试者工作特征曲线下面积分别为 0.72、0.75 和 0.83(医疗保险)和 0.66、0.66 和 0.70(医疗补助),BMI≥25、≥30 和≥40。阳性预测值分别为 81.5%、80.6%和 64.7%(医疗保险)和 81.6%、77.5%和 62.5%(医疗补助),BMI≥25、≥30 和≥40。当 BMI 测量值缺失时,该提议模型可以在索赔数据库中识别肥胖类别,并且可以用于混杂调整、定义亚组或概率偏差分析。