Choi Inyoung, Long Qi, Getzen Emily
School of Engineering and Applied Sciences at the University of Pennsylvania, Philadelphia, PA.
Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA.
medRxiv. 2024 May 7:2024.05.06.24306959. doi: 10.1101/2024.05.06.24306959.
Electronic health records offer great promise for early disease detection, treatment evaluation, information discovery, and other important facets of precision health. Clinical notes, in particular, may contain nuanced information about a patient's condition, treatment plans, and history that structured data may not capture. As a result, and with advancements in natural language processing, clinical notes have been increasingly used in supervised prediction models. To predict long-term outcomes such as chronic disease and mortality, it is often advantageous to leverage data occurring at multiple time points in a patient's history. However, these data are often collected at irregular time intervals and varying frequencies, thus posing an analytical challenge. Here, we propose the use of large language models (LLMs) for robust temporal harmonization of clinical notes across multiple visits. We compare multiple state-of-the-art LLMs in their ability to generate useful information during time gaps, and evaluate performance in supervised deep learning models for clinical prediction.
电子健康记录在早期疾病检测、治疗评估、信息发现以及精准医疗的其他重要方面有着巨大潜力。尤其是临床记录,可能包含有关患者病情、治疗计划和病史的细微信息,而结构化数据可能无法捕捉这些信息。因此,随着自然语言处理技术的进步,临床记录越来越多地用于监督预测模型。为了预测诸如慢性病和死亡率等长期结果,利用患者病史中多个时间点的数据通常具有优势。然而,这些数据往往是在不规则的时间间隔和不同的频率下收集的,因此带来了分析上的挑战。在此,我们建议使用大语言模型(LLMs)对多次就诊的临床记录进行稳健的时间协调。我们比较了多个最先进的大语言模型在时间间隔期间生成有用信息的能力,并评估了它们在临床预测的监督深度学习模型中的性能。