Akagi Yu, Seki Tomohisa, Kawazoe Yoshimasa, Takiguchi Toru, Ohe Kazuhiko
Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Japan.
AMIA Annu Symp Proc. 2025 May 22;2024:124-133. eCollection 2024.
Advancements in artificial intelligence propelled the implementation of general-purpose multitasking agents called foundation models. However, it has been challenging for foundation models to handle structured longitudinal medical data due to the mixed data types and variable timestamps in these data. Acquiring large training data is another obstacle. This study proposes a generative foundation model to manage patient trajectory data of variable lengths with mixed data types (categorical and continuous variables). Additionally, we propose a data pipeline to supply real-world data large enough to support foundation models. We locally obtained a large clinical dataset with a reproducible data pipeline scheme that leveraged a national HL7 message standard. Our trained model acquired the ability to suggest clinically relevant medical concepts and continuous variables for general purposes. The model also synthesized a database of more than 10,000 realistic patient trajectories. Our results suggest promising future downstream clinical applications of the foundation model.
人工智能的进步推动了名为基础模型的通用多任务代理的实施。然而,由于这些数据中的数据类型混合和时间戳可变,基础模型处理结构化纵向医学数据一直具有挑战性。获取大量训练数据是另一个障碍。本研究提出了一种生成式基础模型,用于管理具有混合数据类型(分类变量和连续变量)的可变长度患者轨迹数据。此外,我们还提出了一种数据管道,以提供足够大的真实世界数据来支持基础模型。我们通过利用国家HL7消息标准的可重复数据管道方案在本地获得了一个大型临床数据集。我们训练的模型获得了为一般目的建议临床相关医学概念和连续变量的能力。该模型还合成了一个包含10000多个真实患者轨迹的数据库。我们的结果表明基础模型在未来的下游临床应用前景广阔。