临床文本时间序列预测：编码器和解码器语言模型家族的适应性调整

Forecasting from Clinical Textual Time Series: Adaptations of the Encoder and Decoder Language Model Families.

作者信息

Noroozizadeh Shahriar, Kumar Sayantan, Weiss Jeremy C

机构信息

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

Division of Intramural Research, National Library of Medicine, Bethesda, MD, USA.

出版信息

ArXiv. 2025 Apr 20:arXiv:2504.10340v2.

DOI:

PMID:40735105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12306807/

Abstract

Clinical case reports encode rich, temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings-extracted via an LLM-assisted annotation pipeline-serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.

摘要

临床病例报告记录了丰富的、随时间变化的患者病程，而依赖结构化数据的传统机器学习方法常常未能充分利用这些信息。在这项工作中，我们引入了文本时间序列的预测问题，其中通过大型语言模型（LLM）辅助注释管道提取的带时间戳的临床发现作为预测的主要输入。我们系统地评估了各种模型，包括基于解码器的微调大型语言模型和基于编码器的变压器模型，用于事件发生预测、时间排序和生存分析任务。我们的实验表明，基于编码器的模型在短期和长期事件预测中始终能获得更高的F1分数和更好的时间一致性，而微调的掩码方法则提高了排序性能。相比之下，指令调整的解码器模型在生存分析中表现出相对优势，尤其是在早期预后设置中。我们的敏感性分析进一步证明了时间排序的重要性，与文本排序（LLM传统训练的文本输入格式）相比，时间排序需要构建临床时间序列。这凸显了从时间有序语料库中可以获得的额外好处，对LLM广泛使用时代的时间任务具有启示意义。