Suppr超能文献

使用真实世界时间序列生成对抗网络对组合时间序列和静态医学数据进行合成和质量评估。

Synthesis and quality assessment of combined time-series and static medical data using a real-world time-series generative adversarial network.

机构信息

Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea.

Department of Radiology, Samsung Medical Center, Sungkyunkwan University, 81 Irwon-Ro, Gangnam-Gu, Seoul, 06351, Republic of Korea.

出版信息

Sci Rep. 2024 Aug 17;14(1):19064. doi: 10.1038/s41598-024-69812-7.

Abstract

This study addresses challenges related to privacy issues in utilizing medical data, particularly the protection of personal information. To overcome this obstacle, the research focuses on data synthesis using real-world time-series generative adversarial networks (RTSGAN). A total of 53,005 data were synthesized using the dataset of 15,799 patients with colorectal cancer. The results of the quantitative evaluation of the synthetic data's quality are as follows: the Hellinger distance ranged from 0 to 0.25; the train on synthetic, test on real (TSTR) and train on real, test on synthetic (TRTS) results showed an average area under the curve of 0.99 and 0.98; a propensity mean squared error was 0.223. The synthetic and real data were similar in the qualitative methods including t-SNE and histogram analyses. The application of synthetic data in predicting five-year survival in colorectal cancer patients demonstrates comparable performance to models based on real data. This study employs distance to closest records and membership inference test to assess potential privacy exposure, revealing minimal risk. This study demonstrated that it is feasible to synthesize medical data, including time-series data, using the RTSGAN, and the synthetic data can be evaluated to accurately reflect the characteristics of real data through quantitative and qualitative methods as well as by utilizing real-world artificial intelligence models.

摘要

本研究解决了在利用医疗数据时与隐私问题相关的挑战,特别是对个人信息的保护。为了克服这一障碍,研究重点是使用真实世界时间序列生成对抗网络(RTSGAN)进行数据综合。使用 15799 例结直肠癌患者的数据集共合成了 53005 个数据。对合成数据质量的定量评估结果如下:Hellinger 距离范围为 0 至 0.25;合成数据上训练、真实数据上测试(TSTR)和真实数据上训练、合成数据上测试(TRTS)的结果显示平均曲线下面积分别为 0.99 和 0.98;倾向均方误差为 0.223。在 t-SNE 和直方图分析等定性方法中,合成数据和真实数据相似。在预测结直肠癌患者五年生存率方面,合成数据的应用与基于真实数据的模型表现相当。本研究使用距离最近记录和成员推断测试来评估潜在的隐私风险,结果显示风险极小。本研究表明,使用 RTSGAN 对医疗数据(包括时间序列数据)进行合成是可行的,并且可以通过定量和定性方法以及利用真实世界的人工智能模型来评估合成数据,以准确反映真实数据的特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c678/11330441/3c68c3ade50b/41598_2024_69812_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验