Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
Microsoft Research, Redmond, Washington, USA.
J Am Med Inform Assoc. 2014 Jul-Aug;21(4):699-706. doi: 10.1136/amiajnl-2013-002162. Epub 2014 Jan 30.
Data-driven risk stratification models built using data from a single hospital often have a paucity of training data. However, leveraging data from other hospitals can be challenging owing to institutional differences with patients and with data coding and capture.
To investigate three approaches to learning hospital-specific predictions about the risk of hospital-associated infection with Clostridium difficile, and perform a comparative analysis of the value of different ways of using external data to enhance hospital-specific predictions.
We evaluated each approach on 132 853 admissions from three hospitals, varying in size and location. The first approach was a single-task approach, in which only training data from the target hospital (ie, the hospital for which the model was intended) were used. The second used only data from the other two hospitals. The third approach jointly incorporated data from all hospitals while seeking a solution in the target space.
The relative performance of the three different approaches was found to be sensitive to the hospital selected as the target. However, incorporating data from all hospitals consistently had the highest performance.
The results characterize the challenges and opportunities that come with (1) using data or models from collections of hospitals without adapting them to the site at which the model will be used, and (2) using only local data to build models for small institutions or rare events.
We show how external data from other hospitals can be successfully and efficiently incorporated into hospital-specific models.
使用单一医院数据构建的数据驱动风险分层模型通常训练数据较少。但是,由于患者和数据编码及采集方面的机构差异,利用其他医院的数据可能具有挑战性。
研究三种方法来学习关于艰难梭菌医院相关性感染风险的特定医院预测,并对使用外部数据增强特定医院预测的不同方法的价值进行比较分析。
我们在三个医院的 132853 次住院中评估了每种方法,这些医院在规模和位置上有所不同。第一种方法是单任务方法,仅使用目标医院(即模型所针对的医院)的训练数据。第二种方法仅使用其他两个医院的数据。第三种方法则同时合并了所有医院的数据,同时在目标空间中寻求解决方案。
三种不同方法的相对性能发现对所选目标医院敏感。然而,合并所有医院的数据始终具有最高的性能。
结果描述了在(1)不将数据或模型适配到模型将被使用的地点,而使用来自医院集合的数据或模型,以及(2)仅使用本地数据为小型机构或罕见事件构建模型时,所面临的挑战和机遇。
我们展示了如何成功有效地将来自其他医院的外部数据合并到特定医院的模型中。