基于 ICD 数据的层级结构的全院范围预测性模型,用于预测非计划性再入院。
A hospital wide predictive model for unplanned readmission using hierarchical ICD data.
机构信息
Strategic Policy Cell at Ghent University Hospital, C. Heymanslaan 10, 9000 Ghent, Belgium.
Strategic Policy Cell at Ghent University Hospital, C. Heymanslaan 10, 9000 Ghent, Belgium; Department of Public Health and Primary Care, Ghent University, C. Heymanslaan 10, 9000 Ghent, Belgium.
出版信息
Comput Methods Programs Biomed. 2019 May;173:177-183. doi: 10.1016/j.cmpb.2019.02.007. Epub 2019 Feb 13.
BACKGROUND AND OBJECTIVE
Hospitals already acquire a large amount of data, mainly for administrative, billing and registration purposes. Tapping on these already available data for additional purposes, aiming at improving care, without significant incremental effort and cost. This potential of secondary patient data is explored through modeling administrative and billing data, as well as the hierarchical structure of pathology codes of the International Classification of Diseases (ICD) in the prediction of unplanned readmissions, as a clinically relevant outcome parameter that can be impacted on in a quality improvement program.
METHODS
In this single-center, hospital-wide observational cohort study, we included all adult patients discharged in 2016 after applying an exclusion protocol (n = 29,702). In addition to administrative variables, such as age and length of stay, structured pathology data were taken into account in predictive models. As a first research question, we compared logistic regression against penalized logistic regression, gradient boosting and Random Forests to predict unplanned readmission. As a second research goal, we investigated the level of hierarchy within the pathology data needed to achieve the best accuracy. Finally, we investigated which prediction variables play a prominent role in predicting hospital readmission. The performance of all models was evaluated using the Area Under the ROC Curve (AUC) measure.
RESULTS
All models have the best predictive results using Random Forests. An added value of 7% is observed compared to a baseline method such as logistic regression. The best model, based on Random Forests, achieved an AUC of 0.77, using the diagnosis category and procedure code as lowest level of the hierarchical pathology data.
CONCLUSIONS
The most accurate model to predict hospital wide unplanned readmission is based on Random Forests and includes the ICD hierarchy, especially diagnosis category. Such an approach lowers the number of predictor variables and yields a higher interpretability than a model based on a detailed diagnosis. The performance of the model proved high enough to be used as a decision support tool.
背景与目的
医院已经获取了大量数据,主要用于行政、计费和登记目的。利用这些已经可用的数据实现其他目的,旨在改善医疗服务,而无需付出巨大的额外努力和成本。通过对行政和计费数据以及国际疾病分类(ICD)病理代码的层次结构进行建模,探索这些二级患者数据的潜力,以预测非计划性再入院,作为可以在质量改进计划中进行干预的临床相关结果参数。
方法
在这项单中心、全院范围的观察性队列研究中,我们纳入了所有在 2016 年应用排除方案后出院的成年患者(n=29702)。除了年龄和住院时间等行政变量外,预测模型还考虑了结构化病理数据。作为第一个研究问题,我们比较了逻辑回归、惩罚逻辑回归、梯度提升和随机森林在预测非计划性再入院方面的表现。作为第二个研究目标,我们研究了实现最佳准确性所需的病理数据层次结构的层次。最后,我们研究了哪些预测变量在预测医院再入院方面发挥了重要作用。所有模型的性能均通过 ROC 曲线下面积(AUC)来评估。
结果
所有模型在使用随机森林时都具有最佳的预测结果。与逻辑回归等基线方法相比,观察到有 7%的额外价值。基于随机森林的最佳模型,在使用 ICD 分类和手术代码作为病理数据的最低层次结构时,AUC 达到 0.77。
结论
最准确的预测医院范围非计划性再入院的模型是基于随机森林的,包括 ICD 层次结构,特别是诊断类别。这种方法降低了预测变量的数量,并且比基于详细诊断的模型具有更高的可解释性。模型的性能足以作为决策支持工具使用。