Wang Xiang, Wang Fei, Hu Jianying, Sorrentino Robert
IBM T. J. Watson Research Center, Yorktown Heights, NY.
AMIA Annu Symp Proc. 2014 Nov 14;2014:1180-7. eCollection 2014.
Disease risk prediction has been a central topic of medical informatics. Although various risk prediction models have been studied in the literature, the vast majority were designed to be single-task, i.e. they only consider one target disease at a time. This becomes a limitation when in practice we are dealing with two or more diseases that are related to each other in terms of sharing common comorbidities, symptoms, risk factors, etc., because single-task prediction models are not equipped to identify these associations across different tasks. In this paper we address this limitation by exploring the application of multi-task learning framework to joint disease risk prediction. Specifically, we characterize the disease relatedness by assuming that the risk predictors underlying these diseases have overlap. We develop an optimization-based formulation that can simultaneously predict the risk for all diseases and learn the shared predictors. Our model is applied to a real Electronic Health Record (EHR) database with 7,839 patients, among which 1,127 developed Congestive Heart Failure (CHF) and 477 developed Chronic Obstructive Pulmonary Disease (COPD). We demonstrate that a properly designed multi-task learning algorithm is viable for joint disease risk prediction and it can discover clinical insights that single-task models would overlook.
疾病风险预测一直是医学信息学的核心主题。尽管文献中已经研究了各种风险预测模型,但绝大多数模型设计为单任务模型,即它们一次只考虑一种目标疾病。当在实践中我们处理两种或更多种在共享共同合并症、症状、风险因素等方面相互关联的疾病时,这就成为了一种限制,因为单任务预测模型无法识别不同任务之间的这些关联。在本文中,我们通过探索多任务学习框架在联合疾病风险预测中的应用来解决这一限制。具体而言,我们通过假设这些疾病潜在的风险预测因子存在重叠来表征疾病相关性。我们开发了一种基于优化的公式,该公式可以同时预测所有疾病的风险并学习共享的预测因子。我们的模型应用于一个包含7839名患者的真实电子健康记录(EHR)数据库,其中1127人患充血性心力衰竭(CHF),477人患慢性阻塞性肺疾病(COPD)。我们证明,一个设计合理的多任务学习算法对于联合疾病风险预测是可行的,并且它可以发现单任务模型会忽略的临床见解。