使用大语言模型从临床病例报告中重建脓毒症轨迹：脓毒症文本时间序列语料库

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis.

作者信息

Noroozizadeh Shahriar, Weiss Jeremy C

出版信息

ArXiv. 2025 Apr 12:arXiv:2504.12326v1.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12306827/

Abstract

Clinical case reports and discharge summaries may be the most complete and accurate summarization of patient encounters, yet they are finalized, i.e., timestamped after the encounter. Complementary data structured streams become available sooner but suffer from incompleteness. To train models and algorithms on more complete and temporally fine-grained data, we construct a pipeline to phenotype, extract, and annotate time-localized findings within case reports using large language models. We apply our pipeline to generate an open-access textual time series corpus for Sepsis-3 comprising 2,139 case reports from the Pubmed-Open Access (PMOA) Subset. To validate our system, we apply it on PMOA and timeline annotations from I2B2/MIMIC-IV and compare the results to physician-expert annotations. We show high recovery rates of clinical findings (event match rates: O1-preview--0.755, Llama 3.3 70B Instruct--0.753) and strong temporal ordering (concordance: O1-preview--0.932, Llama 3.3 70B Instruct--0.932). Our work characterizes the ability of LLMs to time-localize clinical findings in text, illustrating the limitations of LLM use for temporal reconstruction and providing several potential avenues of improvement via multimodal integration.

摘要

临床病例报告和出院小结可能是对患者诊疗情况最完整、准确的总结，但它们是在诊疗结束后才最终确定的，即带有时间戳。补充性的结构化数据流虽然能更快获取，但存在不完整性。为了在更完整且时间粒度更细的数据上训练模型和算法，我们构建了一个管道，利用大语言模型对病例报告中的时间定位发现进行表型分析、提取和标注。我们应用我们的管道生成了一个用于脓毒症-3的开放获取文本时间序列语料库，该语料库包含来自PubMed开放获取（PMOA）子集的2139份病例报告。为了验证我们的系统，我们将其应用于来自I2B2/MIMIC-IV的PMOA和时间线标注，并将结果与医生专家的标注进行比较。我们展示了临床发现的高恢复率（事件匹配率：O1-preview——0.755，Llama 3.3 70B Instruct——0.753）和很强的时间顺序性（一致性：O1-preview——0.932，Llama 3.3 70B Instruct——0.932）。我们的工作刻画了大语言模型在文本中对临床发现进行时间定位的能力，阐明了大语言模型在时间重建方面的使用局限性，并通过多模态整合提供了几个潜在的改进途径。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用大语言模型从临床病例报告中重建脓毒症轨迹：脓毒症文本时间序列语料库

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis.

作者信息

出版信息

相似文献

使用大语言模型从临床病例报告中重建脓毒症轨迹：脓毒症文本时间序列语料库

Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis.

作者信息

出版信息

相似文献