Zhu K, Lou Z, Zhou J, Ballester N, Kong N, Parikh P
Nan Kong, 206 S. Martin Jischke Dr., West Lafayette, IN 47907, USA, E-mail:
Methods Inf Med. 2015;54(6):560-7. doi: 10.3414/ME14-02-0017. Epub 2015 Nov 9.
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare".
Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners.
Explore the use of conditional logistic regression to increase the prediction accuracy.
We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models.
The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures.
It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
本文是《医学信息方法》关于“医疗保健中的大数据与分析”这一重点主题的一部分。
医院再入院会增加医疗成本,并给医护人员和患者带来极大困扰。因此,医疗保健机构非常有兴趣预测哪些患者有再次入院的风险。然而,当前基于逻辑回归的风险预测模型应用于医院管理数据时,预测能力有限。同时,尽管决策树和随机森林已被应用,但它们对于医院从业者来说往往过于复杂而难以理解。
探索使用条件逻辑回归来提高预测准确性。
我们分析了一个全州范围的医疗保健成本和利用项目(HCUP)住院患者出院记录数据集,其中包括来自加利福尼亚州的患者人口统计学、临床和护理利用数据。我们提取了在11个月期间有住院经历的心力衰竭医疗保险受益人的记录。我们通过欠采样纠正了数据不平衡问题。在我们的研究中,我们首先应用标准逻辑回归和决策树来获得有影响的变量并得出具有实际意义的决策规则。然后我们对原始数据集进行分层,并对每个数据层应用逻辑回归。我们进一步探索了逻辑回归建模中交互变量的影响。我们进行交叉验证以评估条件逻辑回归(CLR)的整体预测性能,并将其与标准分类模型进行比较。
所开发的CLR模型优于几种标准分类模型(例如,直接逻辑回归、逐步逻辑回归、随机森林、支持向量机)。例如,最佳的CLR模型比直接逻辑回归模型的分类准确率提高了近20%。此外,所开发的CLR模型比标准分类模型的敏感性往往高出10%以上,这意味着在加利福尼亚州,一年中可以多正确识别400 - 500例心力衰竭患者的再入院情况。最后,从HCUP数据中识别出的几个关键预测因素包括出院处置地点、慢性病数量和急性手术数量。
以临时方式应用从决策树获得的简单决策规则来指导队列分层将是有益的。在为不同数据层构建逻辑回归模型时,探索有影响的预测因素之间成对交互的影响可能具有潜在益处。明智地使用所开发的临时CLR模型为医院再入院预测模型的未来发展提供了见解,这可以在识别高危患者和制定有效的出院后护理策略方面带来更好的直觉。最后,本文有望提高对收集额外标志物数据以及为再入院风险预测的大规模探索性研究开发必要数据库基础设施的认识。