Li Jingteng, Zakka Kimberley R, Booth John, Rigny Louise, Ray Samiran, Cortina-Borja Mario, Barnaghi Payam, Sebire Neil
Great Ormond Street Institute of Child Health, University College London, London, UK.
Data Research Innovation and Virtual Environment, Great Ormond Street Hospital for Children, London, UK.
BMC Med Inform Decis Mak. 2025 Jan 28;25(1):45. doi: 10.1186/s12911-024-02812-9.
Unsupervised feature learning methods inspired by natural language processing (NLP) models are capable of constructing patient-specific features from longitudinal Electronic Health Records (EHR).
We applied document embedding algorithms to real-world paediatric intensive care (PICU) EHR data to extract patient-specific features from 1853 patients' PICU journeys using 647 unique lab tests and medication events. We evaluated the clinical utility of the patient features via a K-means clustering analysis.
We trained a document embedding model under a unique evaluation pipeline and obtained latent patient feature vectors for all 1853 patients. We performed unsupervised clustering to the patient vectors as a downstream analysis and obtained 5 distinct clusters via hyperparameter optimisation. Significant variations (p<0.0001) within both patient characteristics and surgery intervention and diagnostic profiles were detected.
The K-means clustering results demonstrated the clinical utilities of the patient-specific features learned from the embedding algorithms. The latent patient features obtained via the embedding process enabled direct applications of other machine learning algorithms. Future work will focus on utilising the temporal information within EHR and extending EHR embedding algorithms to develop personalised patient journey predictions.
受自然语言处理(NLP)模型启发的无监督特征学习方法能够从纵向电子健康记录(EHR)中构建患者特异性特征。
我们将文档嵌入算法应用于真实世界的儿科重症监护(PICU)EHR数据,以使用647项独特的实验室检查和用药事件从1853名患者的PICU病程中提取患者特异性特征。我们通过K均值聚类分析评估了患者特征的临床效用。
我们在一个独特的评估管道下训练了一个文档嵌入模型,并为所有1853名患者获得了潜在的患者特征向量。我们对患者向量进行了无监督聚类作为下游分析,并通过超参数优化获得了5个不同的聚类。在患者特征、手术干预和诊断概况方面均检测到显著差异(p<0.0001)。
K均值聚类结果证明了从嵌入算法中学到的患者特异性特征的临床效用。通过嵌入过程获得的潜在患者特征能够直接应用其他机器学习算法。未来的工作将集中在利用EHR中的时间信息以及扩展EHR嵌入算法以开发个性化的患者病程预测。