IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1968-1978. doi: 10.1109/TCBB.2018.2827029. Epub 2018 Apr 16.
With increased use of electronic medical records (EMRs), data mining on medical data has great potential to improve the quality of hospital treatment and increase the survival rate of patients. Early readmission prediction enables early intervention, which is essential to preventing serious or life-threatening events, and act as a substantial contributor to reduce healthcare costs. Existing works on predicting readmission often focus on certain vital signs and diseases by extracting statistical features. They also fail to consider skewness of class labels in medical data and different costs of misclassification errors. In this paper, we recur to the merits of convolutional neural networks (CNN) to automatically learn features from time series of vital sign, and categorical feature embedding to effectively encode feature vectors with heterogeneous clinical features, such as demographics, hospitalization history, vital signs, and laboratory tests. Then, both learnt features via CNN and statistical features via feature embedding are fed into a multilayer perceptron (MLP) for prediction. We use a cost-sensitive formulation to train MLP during prediction to tackle the imbalance and skewness challenge. We validate the proposed approach on two real medical datasets from Barnes-Jewish Hospital, and all data is taken from historical EMR databases and reflects the kinds of data that would realistically be available at the clinical prediction system in hospitals. We find that early prediction of readmission is possible and when compared with state-of-the-art existing methods used by hospitals, our methods perform significantly better. For example, using the general hospital wards data for 30-day readmission prediction, the area under the curve (AUC) for the proposed model was 0.70, significantly higher than all the baseline methods. Based on these results, a system is being deployed in hospital settings with the proposed forecasting algorithms to support treatment.
随着电子病历(EMR)的广泛使用,对医疗数据进行数据挖掘具有极大的潜力,可以提高医院治疗质量并提高患者的生存率。早期再入院预测可实现早期干预,这对于预防严重或危及生命的事件至关重要,并可大幅降低医疗成本。现有的再入院预测工作通常通过提取统计特征来关注某些生命体征和疾病。它们也没有考虑到医疗数据中类标签的偏态和不同的分类错误代价。在本文中,我们利用卷积神经网络(CNN)的优点,从生命体征的时间序列中自动学习特征,并使用分类特征嵌入有效地编码具有异质临床特征(如人口统计学、住院史、生命体征和实验室检查)的特征向量。然后,通过 CNN 学习的特征和通过特征嵌入学习的统计特征都被输入到多层感知机(MLP)中进行预测。我们使用一种代价敏感的公式在预测过程中训练 MLP,以解决不平衡和偏态问题。我们在两个来自巴恩斯犹太医院的真实医疗数据集上验证了所提出的方法,所有数据均取自历史 EMR 数据库,反映了医院临床预测系统中实际可用的数据类型。我们发现再入院的早期预测是可行的,与医院使用的现有最先进方法相比,我们的方法性能显著提高。例如,使用综合医院病房数据进行 30 天再入院预测,所提出模型的曲线下面积(AUC)为 0.70,明显高于所有基线方法。基于这些结果,正在医院环境中部署具有所提出的预测算法的系统以支持治疗。