Wang Jing, Weiss Jeremy C
National Library of Medicine, Bethesda, Maryland, USA.
ArXiv. 2025 Apr 15:arXiv:2504.12350v1.
Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series-structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103 N=93). We find that the LLM models have moderate event recall (O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). By establishing the task, annotation, and assessment systems, and by demonstrating high concordance, this work may serve as a benchmark for leveraging the PMOA corpus for temporal analytics. Code is available at: https://github.com/jcweiss2/LLM-Timeline-PMOA/.
临床事件的时间安排对于患者病程的特征描述至关重要,有助于进行诸如过程追踪、预测和因果推理等分析。然而,结构化电子健康记录捕获的对这些任务至关重要的数据元素很少,而临床报告缺乏以结构化形式呈现的事件时间定位。我们提出了一个系统,该系统将病例报告转换为文本时间序列——由文本事件和时间戳组成的结构化对。我们对比了对十份随机抽样的PubMed开放获取(PMOA)病例报告(N = 152,974)的人工注释和大语言模型(LLM)注释(分别为n = 320和n = 390),并评估了大语言模型之间的一致性(n = 3,103;N = 93)。我们发现,大语言模型具有中等的事件召回率(O1-preview:0.80),但在已识别事件之间具有较高的时间一致性(O1-preview:0.95)。通过建立任务、注释和评估系统,并通过展示高一致性,这项工作可作为利用PMOA语料库进行时间分析的基准。代码可在以下网址获取:https://github.com/jcweiss2/LLM-Timeline-PMOA/ 。