Huang Xueqing, Gu Tian
Department of Biostatistics, Columbia University Mailman School of Public Health, New York, NY.
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:225-234. eCollection 2025.
Wearable devices collect complex structured data with high-dimensional and time-series features that are challenging for traditional models to handle efficiently. We propose EntroLLM, a new method that combines entropy measures and the low-dimensional representation (embedding) generated from large language models (LLMs) to enhance risk prediction using wearable device data. In EntroLLM, the entropy quantifies the variability of a subject's physical activity patterns, while the LLM embedding approximates the latent temporal structure. We evaluate the feasibility and performance of EntroLLM using NHANES data to predict overweight status using demographics and physical activity collected from wearable devices. Results show that combining entropy with GPT-based embedding improves model performance compared to baseline models and other embedding techniques, leading to an average increase in AUC from 0.56 to 0.64. EntroLLM showcases the potential of combining entropy and LLM-based embedding and offers a promising approach to wearable device data analysis for predicting health outcomes.
可穿戴设备收集具有高维和时间序列特征的复杂结构化数据,传统模型难以有效处理这些数据。我们提出了EntroLLM,这是一种将熵度量与大语言模型(LLMs)生成的低维表示(嵌入)相结合的新方法,以增强使用可穿戴设备数据进行的风险预测。在EntroLLM中,熵量化了个体身体活动模式的变异性,而LLM嵌入则近似潜在的时间结构。我们使用美国国家健康与营养检查调查(NHANES)数据评估EntroLLM的可行性和性能,以利用从可穿戴设备收集的人口统计学和身体活动数据预测超重状态。结果表明,与基线模型和其他嵌入技术相比,将熵与基于GPT的嵌入相结合可提高模型性能,导致曲线下面积(AUC)平均从0.56提高到0.64。EntroLLM展示了结合熵和基于LLM的嵌入的潜力,并为可穿戴设备数据分析以预测健康结果提供了一种有前景的方法。