Igbe Tobore, Kovatchev Boris
Center for Diabetes Technology, School of Medicine, University of Virginia, Charlottesville, VA, USA.
J Diabetes Sci Technol. 2025 Mar 20:19322968251323913. doi: 10.1177/19322968251323913.
The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.
Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.
The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.
Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.
连续血糖监测(CGM)设备的出现不仅彻底改变了糖尿病管理方式,还开辟了新的研究途径。本文提出了一种将CGM每日血糖曲线编码为CGM字符串和CGM文本的新方法,该方法可保留临床指标信息但压缩数据。
定义了8个字母来表示血糖范围。从误差中得出赤池信息准则(AIC),并估计每个字母的压缩率,以确定用于编码CGM每日血糖曲线的最佳字母。分析使用了来自六项不同研究的数据,这些研究采用了不同的治疗方式,应用于1型糖尿病(T1D)或2型糖尿病(T2D)患者以及非糖尿病患者。数据集分为70%用于训练,30%用于验证。
训练数据的结果表明,对于T1D或T2D患者的每日CGM血糖曲线编码,9个字母的字母表是最佳的,产生了最低的AIC分数,使信息损失最小化。然而,在健康人群中,所需字母较少,考虑到数据变化较小,这是可以预期的。通过皮尔逊相关性进行的进一步测试表明,9个字母的字母表近似变异系数,相关性在0.945至0.965之间。
将CGM数据编码为文本可以增强CGM血糖曲线的分类,并使成熟的搜索引擎能够处理CGM数据。其他潜在应用包括预测建模、异常检测、索引、趋势分析,或用于糖尿病研究和临床实践的未来生成式人工智能应用。