Research and Analytics, Collective Health, San Francisco, CA.
Center for Primary Care, Harvard Medical School, Boston, MA.
Med Care. 2019 Aug;57(8):592-600. doi: 10.1097/MLR.0000000000001147.
Social determinants of health (SDH) at the area level are understood to influence the likelihood of having poor glycemic control for patients with type 2 diabetes mellitus (T2DM).
To develop a model for predicting whether a person with T2DM has uncontrolled diabetes (hemoglobin A1c ≥9%), incorporating individual and area-level (census tract) covariates.
Development and validation of machine learning models.
Total of N=1,015,808 privately insured persons in claims data with T2DM.
C-statistic, sensitivity, specificity, positive predictive value, negative predictive value, and accuracy.
A standard logistic regression model selecting among the available individual-level covariates and area-level SDH covariates (at the census tract level) performed poorly, with a C-statistic of 0.685, sensitivity of 25.6%, specificity of 90.1%, positive predictive value of 56.9%, negative predictive value of 70.4%, and accuracy of 68.4% on a 25% held-out validation subset of the data. By contrast, machine learning models improved upon risk prediction, with the highest performance from a random forest algorithm with a C-statistic of 0.928, sensitivity of 68.5%, specificity of 94.6%, positive predictive value of 69.8%, negative predictive value of 94.3%, and accuracy of 90.6%. SDH variables alone explained 16.9% of variation in uncontrolled diabetes.
A predictive model developed through a machine learning approach may assist health care organizations to identify which area-level SDH data to monitor for prediction of diabetes control, for potential use in risk-adjustment and targeting.
区域层面的健康社会决定因素(SDH)被认为会影响 2 型糖尿病(T2DM)患者血糖控制不佳的可能性。
开发一种预测 T2DM 患者是否患有未控制糖尿病(糖化血红蛋白≥9%)的模型,纳入个体和区域(普查区)水平的协变量。
机器学习模型的开发和验证。
来自索赔数据的共 1,015,808 名有 T2DM 的私人保险患者。
C 统计量、敏感性、特异性、阳性预测值、阴性预测值和准确性。
一个标准的逻辑回归模型,在可用的个体水平协变量和区域水平 SDH 协变量(在普查区水平)中进行选择,表现不佳,C 统计量为 0.685,敏感性为 25.6%,特异性为 90.1%,阳性预测值为 56.9%,阴性预测值为 70.4%,准确性为 68.4%,在数据的 25%保留验证子集中。相比之下,机器学习模型提高了风险预测的性能,其中表现最好的是随机森林算法,C 统计量为 0.928,敏感性为 68.5%,特异性为 94.6%,阳性预测值为 69.8%,阴性预测值为 94.3%,准确性为 90.6%。SDH 变量单独解释了 16.9%的未控制糖尿病的变异性。
通过机器学习方法开发的预测模型可以帮助医疗保健组织识别需要监测的区域层面 SDH 数据,以预测糖尿病控制情况,可能用于风险调整和目标定位。