Zhang Xi Sheryl, Tang Fengyi, Dodge Hiroko H, Zhou Jiayu, Wang Fei
Department of Healthcare Policy and Research. Weill Cornell Medicine. Cornell University.
Department of Computer Science and Engineering. Michigan State University.
KDD. 2019 Aug;2019:2487-2495. doi: 10.1145/3292500.3330779.
In recent years, large amounts of health data, such as patient Electronic Health Records (EHR), are becoming readily available. This provides an unprecedented opportunity for knowledge discovery and data mining algorithms to dig insights from them, which can, later on, be helpful to the improvement of the quality of care delivery. Predictive modeling of clinical risks, including in-hospital mortality, hospital readmission, chronic disease onset, condition exacerbation, etc., from patient EHR, is one of the health data analytic problems that attract lots of the interests. The reason is not only because the problem is important in clinical settings, but also is challenging when working with EHR such as sparsity, irregularity, temporality, etc. Different from applications in other domains such as computer vision and natural language processing, the data samples in medicine (patients) are relatively limited, which creates lots of troubles for building effective predictive models, especially for complicated ones such as deep learning. In this paper, we propose MetaPred, a meta-learning framework for clinical risk prediction from longitudinal patient EHR. In particular, in order to predict the target risk with limited data samples, we train a meta-learner from a set of related risk prediction tasks which learns how a good predictor is trained. The meta-learned can then be directly used in target risk prediction, and the limited available samples in the target domain can be used for further fine-tuning the model performance. The effectiveness of MetaPred is tested on a real patient EHR repository from Oregon Health & Science University. We are able to demonstrate that with Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) as base predictors, MetaPred can achieve much better performance for predicting target risk with low resources comparing with the predictor trained on the limited samples available for this risk alone.
近年来,大量的健康数据,如患者电子健康记录(EHR),正变得易于获取。这为知识发现和数据挖掘算法从这些数据中挖掘见解提供了前所未有的机会,这些见解随后有助于提高医疗服务质量。从患者电子健康记录中对临床风险进行预测建模,包括住院死亡率、医院再入院率、慢性病发病、病情加重等,是吸引众多关注的健康数据分析问题之一。原因不仅在于该问题在临床环境中很重要,而且在处理电子健康记录时具有挑战性,如稀疏性、不规则性、时间性等。与计算机视觉和自然语言处理等其他领域的应用不同,医学领域的数据样本(患者)相对有限,这给构建有效的预测模型带来了诸多麻烦,尤其是对于深度学习等复杂模型。在本文中,我们提出了MetaPred,一种用于从纵向患者电子健康记录中进行临床风险预测的元学习框架。具体而言,为了用有限的数据样本预测目标风险,我们从一组相关的风险预测任务中训练一个元学习者,该元学习者学习如何训练一个好的预测器。然后,元学习器可直接用于目标风险预测,并且目标领域中有限的可用样本可用于进一步微调模型性能。我们在俄勒冈健康与科学大学的真实患者电子健康记录库上测试了MetaPred的有效性。我们能够证明,以卷积神经网络(CNN)和循环神经网络(RNN)作为基础预测器,与仅在该风险可用的有限样本上训练的预测器相比,MetaPred在低资源情况下预测目标风险时能取得更好的性能。