Greenwald Jeffrey L, Cronin Patrick R, Carballo Victoria, Danaei Goodarz, Choy Garry
*Department of Medicine, Core Educator Faculty †Laboratory of Computer Science, Massachusetts General Hospital, Boston ‡Partners HealthCare, Needham §Departments of Global Health and Population and Epidemiology, Harvard TH Chan School of Public Health ∥QPID Informatics and Massachusetts General Physicians Organization, Massachusetts General Hospital, Boston, MA.
Med Care. 2017 Mar;55(3):261-266. doi: 10.1097/MLR.0000000000000651.
With the increasing focus on reducing hospital readmissions in the United States, numerous readmissions risk prediction models have been proposed, mostly developed through analyses of structured data fields in electronic medical records and administrative databases. Three areas that may have an impact on readmission but are poorly captured using structured data sources are patients' physical function, cognitive status, and psychosocial environment and support.
The objective of the study was to build a discriminative model using information germane to these 3 areas to identify hospitalized patients' risk for 30-day all cause readmissions.
We conducted clinician focus groups to identify language used in the clinical record regarding these 3 areas. We then created a dataset including 30,000 inpatients, 10,000 from each of 3 hospitals, and searched those records for the focus group-derived language using natural language processing. A 30-day readmission prediction model was developed on 75% of the dataset and validated on the other 25% and also on hospital specific subsets.
Focus group language was aggregated into 35 variables. The final model had 16 variables, a validated C-statistic of 0.74, and was well calibrated. Subset validation of the model by hospital yielded C-statistics of 0.70-0.75.
Deriving a 30-day readmission risk prediction model through identification of physical, cognitive, and psychosocial issues using natural language processing yielded a model that performs similarly to the better performing models previously published with the added advantage of being based on clinically relevant factors and also automated and scalable. Because of the clinical relevance of the variables in the model, future research may be able to test if targeting interventions to identified risks results in reductions in readmissions.
随着美国对降低医院再入院率的关注度不断提高,已经提出了许多再入院风险预测模型,其中大部分是通过分析电子病历和管理数据库中的结构化数据字段开发的。患者的身体功能、认知状态以及心理社会环境与支持这三个领域可能会对再入院产生影响,但使用结构化数据源却难以充分体现。
本研究的目的是利用与这三个领域相关的信息构建一个判别模型,以识别住院患者30天全因再入院的风险。
我们组织了临床医生焦点小组,以确定临床记录中关于这三个领域所使用的语言。然后,我们创建了一个包含30000名住院患者的数据集,其中10000名来自三家医院中的每家医院,并使用自然语言处理技术在这些记录中搜索焦点小组得出的语言。在数据集的75%上开发了一个30天再入院预测模型,并在另外25%以及医院特定子集中进行了验证。
焦点小组语言被汇总为35个变量。最终模型有16个变量,验证后的C统计量为0.74,且校准良好。按医院对模型进行子集验证得出的C统计量为0.70 - 0.75。
通过使用自然语言处理识别身体、认知和心理社会问题来推导30天再入院风险预测模型,得到的模型性能与之前发表的表现较好的模型相似,其额外优势在于基于临床相关因素,并且具有自动化和可扩展性。由于模型中变量的临床相关性,未来的研究或许能够测试针对已识别风险进行干预是否会降低再入院率。