Chen Robert, Sun Jimeng, Dittus Robert S, Fabbri Daniel, Kirby Jacqueline, Laffer Cheryl L, McNaughton Candace D, Malin Bradley
IEEE J Biomed Health Inform. 2016 Jan 4. doi: 10.1109/JBHI.2016.2514264.
The goal of this study is to devise a machine learning framework to assist care coordination programs in prognostic stratification to design and deliver personalized care plans and to allocate financial and medical resources effectively.
This study is based on a de-identified cohort of 2,521 hypertension patients from a chronic care coordination program at the Vanderbilt University Medical Center. Patients were modeled as vectors of features derived from electronic health records (EHRs) over a six-year period. We applied a stepwise regression to identify risk factors associated with a decrease in mean arterial pressure of at least 2 mmHg after program enrollment. The resulting features were subsequently validated via a logistic regression classifier. Finally, risk factors were applied to group the patients through model-based clustering.
We identified a set of predictive features that consisted of a mix of demographic, medication, and diagnostic concepts. Logistic regression over these features yielded an area under the ROC curve (AUC) of 0.71 (95% CI: [0.67, 0.76]). Based on these features, four clinically meaningful groups are identified through clustering - two of which represented patients with more severe disease profiles, while the remaining represented patients with mild disease profiles.
Patients with hypertension can exhibit significant variation in their blood pressure control status and responsiveness to therapy. Yet this work shows that a clustering analysis can generate more homogeneous patient groups, which may aid clinicians in designing and implementing customized care programs.
The study shows that predictive modeling and clustering using EHR data can be beneficial for providing a systematic, generalized approach for care providers to tailor their management approach based upon patient-level factors.
本研究的目标是设计一个机器学习框架,以协助护理协调项目进行预后分层,从而设计并提供个性化护理计划,并有效分配财政和医疗资源。
本研究基于范德堡大学医学中心慢性护理协调项目中2521名高血压患者的去识别队列。患者被建模为六年期间从电子健康记录(EHR)中提取的特征向量。我们应用逐步回归来识别与项目入组后平均动脉压至少降低2 mmHg相关的风险因素。随后通过逻辑回归分类器对所得特征进行验证。最后,应用风险因素通过基于模型的聚类对患者进行分组。
我们识别出一组预测特征,其中包括人口统计学、药物治疗和诊断概念的混合。对这些特征进行逻辑回归得到的ROC曲线下面积(AUC)为0.71(95% CI:[0.67, 0.76])。基于这些特征,通过聚类识别出四个具有临床意义的组——其中两组代表疾病特征更严重的患者,而其余两组代表疾病特征较轻的患者。
高血压患者在血压控制状态和对治疗的反应性方面可能表现出显著差异。然而,这项工作表明聚类分析可以生成更同质的患者组,这可能有助于临床医生设计和实施定制的护理项目。
该研究表明,使用EHR数据进行预测建模和聚类有助于为护理人员提供一种系统的、通用的方法,以便根据患者层面的因素调整管理方法。