Suppr超能文献

QAMT:一个基于大语言模型的、用于生成质量有保证的医学时间序列数据的框架。

QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation.

作者信息

Luo Yi, Zhang Yong, Xing Chunxiao, Ren Peng, Liu Xinhao

机构信息

School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.

BNRist, DCST, RIIT, Tsinghua University, Beijing 100084, China.

出版信息

Sensors (Basel). 2025 Sep 3;25(17):5482. doi: 10.3390/s25175482.

Abstract

The extensive deployment of diverse sensors in hospitals has resulted in the collection of various medical time-series data. However, these real-world medical time-series data suffer from limited volume, poor data quality, and privacy concerns, resulting in performance degradation in downstream tasks, such as medical research and clinical decision-making. Existing studies provide generated medical data as a supplement or alternative to real-world data. However, medical time-series data are inherently complex, including temporal data such as laboratory measurements and static event data such as demographics and clinical outcomes, with each patient's temporal data being influenced by their static event data. This intrinsic complexity makes the generation of high-quality medical time-series data particularly challenging. Traditional methods typically employ Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), but these methods struggle to generate high-quality static event data of medical time-series data and often lack interpretability. Currently, large language models (LLMs) introduce new opportunities for medical data generation, but they face difficulties in generating temporal data and have challenges in specific domain generation tasks. In this study, we are the first to propose an LLM-based framework for modularly generating medical time-series data, QAMT, which generates quality-assured data and ensures the interpretability of the generation process. QAMT constructs a reliable health knowledge graph to provide medical expertise to the LLMs and designs dual modules to simultaneously generate static event data and temporal data, constituting high-quality medical time-series data. Moreover, QAMT introduces a quality assurance module to evaluate the generated data. Unlike existing methods, QAMT preserves the interpretability of the data generation process. Experimental results show that QAMT can generate higher-quality time-series medical data compared with existing methods.

摘要

医院中各种传感器的广泛部署导致了各类医学时间序列数据的收集。然而,这些真实世界的医学时间序列数据存在数据量有限、数据质量差以及隐私问题,导致在诸如医学研究和临床决策等下游任务中的性能下降。现有研究提供生成的医学数据作为真实世界数据的补充或替代。然而,医学时间序列数据本质上很复杂,包括实验室测量等时间数据以及人口统计学和临床结果等静态事件数据,每个患者的时间数据都会受到其静态事件数据的影响。这种内在的复杂性使得生成高质量的医学时间序列数据极具挑战性。传统方法通常采用生成对抗网络(GAN)或变分自编码器(VAE),但这些方法难以生成医学时间序列数据中的高质量静态事件数据,并且往往缺乏可解释性。目前,大语言模型(LLM)为医学数据生成带来了新机遇,但它们在生成时间数据方面存在困难,并且在特定领域生成任务中面临挑战。在本研究中,我们首次提出了一种基于LLM的模块化生成医学时间序列数据的框架QAMT,它能生成质量有保证的数据并确保生成过程的可解释性。QAMT构建了一个可靠的健康知识图谱,为LLM提供医学专业知识,并设计了双模块来同时生成静态事件数据和时间数据,从而构成高质量的医学时间序列数据。此外,QAMT引入了一个质量保证模块来评估生成的数据。与现有方法不同,QAMT保留了数据生成过程的可解释性。实验结果表明,与现有方法相比,QAMT能够生成更高质量的时间序列医学数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0480/12431261/cae42e4339f3/sensors-25-05482-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验