Suppr超能文献

通过数据增强模拟逼真的连续血糖监测时间序列

Simulating Realistic Continuous Glucose Monitor Time Series By Data Augmentation.

作者信息

Gomez Louis A, Toye Adedolapo Aishat, Hum R Stanley, Kleinberg Samantha

机构信息

Stevens Institute of Technology, Hoboken, NJ, USA.

The Montreal Children's Hospital, McGill University Health Centre, Montreal, QC, Canada.

出版信息

J Diabetes Sci Technol. 2025 Jan;19(1):114-122. doi: 10.1177/19322968231181138. Epub 2023 Jun 23.

Abstract

BACKGROUND

Simulated data are a powerful tool for research, enabling benchmarking of blood glucose (BG) forecasting and control algorithms. However, expert created models provide an unrealistic view of real-world performance, as they lack the features that make real data challenging, while black-box approaches such as generative adversarial networks do not enable systematic tests to diagnose model performance.

METHODS

To address this, we propose a method that learns missingness and error properties of continuous glucose monitor (CGM) data collected from people with type 1 diabetes (OpenAPS, OhioT1DM, RCT, and Racial-Disparity), and then augments simulated BG data with these properties. On the task of BG forecasting, we test how well our method brings performance closer to that of real CGM data compared with current simulation practices for missing data (random dropout) and error (Gaussian noise, CGM error model).

RESULTS

Our methods had the smallest performance difference versus real data compared with random dropout and Gaussian noise when individually testing the effects of missing data and error on simulated BG in most cases. When combined, our approach was significantly better than Gaussian noise and random dropout for all data sets except OhioT1DM. Our error model significantly improved results on diverse data sets.

CONCLUSIONS

We find a significant gap between BG forecasting performance on simulated and real data, and our method can be used to close this gap. This will enable researchers to rigorously test algorithms and provide realistic estimates of real-world performance without overfitting to real data or at the expense of data collection.

摘要

背景

模拟数据是一种强大的研究工具,可用于对血糖(BG)预测和控制算法进行基准测试。然而,专家创建的模型无法真实反映现实世界中的性能,因为它们缺乏使真实数据具有挑战性的特征,而诸如生成对抗网络等黑箱方法无法进行系统测试以诊断模型性能。

方法

为了解决这个问题,我们提出了一种方法,该方法可学习从1型糖尿病患者(OpenAPS、OhioT1DM、RCT和种族差异研究)收集的连续血糖监测(CGM)数据的缺失和误差属性,然后用这些属性增强模拟的BG数据。在BG预测任务中,我们测试与当前针对缺失数据(随机丢弃)和误差(高斯噪声、CGM误差模型)的模拟方法相比,我们的方法能在多大程度上使性能更接近真实CGM数据。

结果

在大多数情况下,单独测试缺失数据和误差对模拟BG的影响时,与随机丢弃和高斯噪声相比,我们的方法与真实数据的性能差异最小。综合来看,除OhioT1DM数据集外,我们的方法在所有数据集上都明显优于高斯噪声和随机丢弃。我们的误差模型显著改善了不同数据集的结果。

结论

我们发现模拟数据和真实数据的BG预测性能之间存在显著差距,我们的方法可用于缩小这一差距。这将使研究人员能够严格测试算法,并在不过度拟合真实数据或不牺牲数据收集的情况下,对现实世界的性能提供现实的估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8f2/11688677/5ebd7a324028/10.1177_19322968231181138-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验