Post Benjamin, Klapaukh Roman, Brett Stephen J, Faisal A Aldo
Department of Bioengineering, Imperial College London, London, UK; Department of Computing, Imperial College London, London, UK; UKRI Centre in AI for Healthcare, Imperial College London, London, UK.
Department of Chemical Engineering, Imperial College London, London, UK.
Lancet Digit Health. 2025 Feb;7(2):e124-e135. doi: 10.1016/S2589-7500(24)00254-1.
Unplanned hospital admissions are associated with worse patient outcomes and cause strain on health systems worldwide. Primary care electronic health records (EHRs) have successfully been used to create prediction models for emergency hospitalisation, but these approaches require a broad range of diagnostic, physiological, and laboratory values. In this study, we aimed to capture temporal patterns of patient activity from EHR data and evaluate their effectiveness in predicting emergency hospital admissions compared with conventional methods.
In this retrospective observational study, we used the Secure Anonymised Information Linkage databank to extract temporal patterns of primary care activity from undifferentiated electronic health record timestamp data for 1·37 million patients in Wales aged 18-80 years with at least one recorded Read code between the years 2016 and 2018. Using Gaussian mixture modelling we grouped patients into distinct temporal clusters, performed a three-stage validation of our approach and calculated the risk of emergency hospital admission for each temporal cluster group. Finally, these temporal clusters were combined with five administrative variables and incorporated into four emergency hospital admission prediction models (logistic regression, naive Bayes, XGBoost, and multilayer perceptron [MLP]) and compared with a more traditional, but data-intensive, modelling technique. The primary outcome was emergency hospital admission as the next health-care event.
Six distinct temporal cluster patterns of primary care EHR activity were identified, associated with varying risks of future emergency hospital admission risk. These patterns were visually interpretable, repeatable at a population-level, and clinically plausible. The best emergency hospital admission prediction model (MLP) achieved an area under the receiver operating characteristic (AUROC) of 0·82 and precision of 0·94 in regional cohorts. In external validation in regional cohorts, similar model performance was observed (AUROC 0·82 and precision 0·92). This model also matched the performance of a more complex model (extended feature model) requiring 33 clinical parameters (AUROC 0·82 vs 0·83; precision 0·94 vs 0·90) for the same task on the same dataset.
We developed a novel machine learning pipeline that extracts interpretable temporal patterns from simple representations of EHR data and can be incorporated into emergency hospital admission predictors. This framework might enable more rapid development of parsimonious clinical prediction models.
UKRI CDT in AI for Healthcare, UKRI Turing AI Fellowship, NIHR Imperial Biomedical Research Centre, and Research Capability Funding.
非计划住院与更差的患者预后相关,并给全球卫生系统带来压力。初级保健电子健康记录(EHR)已成功用于创建急诊住院预测模型,但这些方法需要广泛的诊断、生理和实验室值。在本研究中,我们旨在从EHR数据中捕捉患者活动的时间模式,并评估其与传统方法相比在预测急诊住院方面的有效性。
在这项回顾性观察研究中,我们使用安全匿名信息链接数据库,从2016年至2018年期间威尔士18至80岁、至少有一个记录的Read编码的137万患者的未分化电子健康记录时间戳数据中提取初级保健活动的时间模式。使用高斯混合模型,我们将患者分组为不同的时间簇,对我们的方法进行了三阶段验证,并计算了每个时间簇组的急诊住院风险。最后,将这些时间簇与五个管理变量相结合,并纳入四个急诊住院预测模型(逻辑回归、朴素贝叶斯、XGBoost和多层感知器[MLP]),并与一种更传统但数据密集的建模技术进行比较。主要结局是作为下一个医疗保健事件的急诊住院。
确定了六种不同的初级保健EHR活动时间簇模式,与未来急诊住院风险的不同风险相关。这些模式在视觉上是可解释的,在人群水平上是可重复的,并且在临床上是合理的。最佳的急诊住院预测模型(MLP)在区域队列中的受试者操作特征曲线下面积(AUROC)为0.82,精度为0.94。在区域队列的外部验证中,观察到了类似的模型性能(AUROC 0.82,精度0.92)。该模型在相同数据集上针对相同任务的性能也与需要33个临床参数的更复杂模型(扩展特征模型)相匹配(AUROC 0.82对0.83;精度0.94对0.