Lee Joon, Maslove David M, Dubin Joel A
School of Public Health and Health Systems, University of Waterloo, Waterloo, Ontario, Canada.
Department of Medicine & Critical Care Program, Queen's University, Kingston, Ontario, Canada.
PLoS One. 2015 May 15;10(5):e0127428. doi: 10.1371/journal.pone.0127428. eCollection 2015.
Clinical outcome prediction normally employs static, one-size-fits-all models that perform well for the average patient but are sub-optimal for individual patients with unique characteristics. In the era of digital healthcare, it is feasible to dynamically personalize decision support by identifying and analyzing similar past patients, in a way that is analogous to personalized product recommendation in e-commerce. Our objectives were: 1) to prove that analyzing only similar patients leads to better outcome prediction performance than analyzing all available patients, and 2) to characterize the trade-off between training data size and the degree of similarity between the training data and the index patient for whom prediction is to be made.
We deployed a cosine-similarity-based patient similarity metric (PSM) to an intensive care unit (ICU) database to identify patients that are most similar to each patient and subsequently to custom-build 30-day mortality prediction models. Rich clinical and administrative data from the first day in the ICU from 17,152 adult ICU admissions were analyzed. The results confirmed that using data from only a small subset of most similar patients for training improves predictive performance in comparison with using data from all available patients. The results also showed that when too few similar patients are used for training, predictive performance degrades due to the effects of small sample sizes. Our PSM-based approach outperformed well-known ICU severity of illness scores. Although the improved prediction performance is achieved at the cost of increased computational burden, Big Data technologies can help realize personalized data-driven decision support at the point of care.
The present study provides crucial empirical evidence for the promising potential of personalized data-driven decision support systems. With the increasing adoption of electronic medical record (EMR) systems, our novel medical data analytics contributes to meaningful use of EMR data.
临床结果预测通常采用静态的、一刀切的模型,这些模型对普通患者效果良好,但对于具有独特特征的个体患者则并非最优。在数字医疗时代,通过识别和分析过去的相似患者来动态个性化决策支持是可行的,这类似于电子商务中的个性化产品推荐。我们的目标是:1)证明仅分析相似患者比分析所有可用患者能带来更好的结果预测性能;2)描述训练数据大小与训练数据和待预测索引患者之间的相似程度之间的权衡。
我们将基于余弦相似度的患者相似性度量(PSM)应用于重症监护病房(ICU)数据库,以识别与每个患者最相似的患者,随后定制构建30天死亡率预测模型。分析了来自17152例成年ICU入院患者在ICU第一天的丰富临床和管理数据。结果证实,与使用所有可用患者的数据相比,仅使用一小部分最相似患者的数据进行训练可提高预测性能。结果还表明,当用于训练的相似患者太少时,由于小样本量的影响,预测性能会下降。我们基于PSM的方法优于著名的ICU疾病严重程度评分。尽管提高的预测性能是以增加计算负担为代价实现的,但大数据技术有助于在护理点实现个性化的数据驱动决策支持。
本研究为个性化数据驱动决策支持系统的巨大潜力提供了关键的实证证据。随着电子病历(EMR)系统的日益普及,我们新颖的医疗数据分析有助于EMR数据的有意义使用。