Oszkinat Clemens, Luczak Susan E, Rosen I Gary
Department of Mathematics, University of Southern California, Los Angeles, 90089, CA, USA.
Department of Psychology, University of Southern California, Los Angeles, 90089, CA, USA.
Neural Comput Appl. 2022 Nov;34(21):18933-18951. doi: 10.1007/s00521-022-07505-w. Epub 2022 Jun 26.
The problem of estimating breath alcohol concentration based on transdermal alcohol biosensor data is considered. Transdermal alcohol concentration provides a promising alternative to classical methods such as breathalyzers or drinking diaries. A physics-informed long Short-term memory (LSTM) network with covariates for the solution of the estimation problem is developed. The data-driven nature of an LSTM is augmented with a first principles physics-based population model for the diffusion of ethanol through the epidermal layer of the skin. The population model in an abstract parabolic framework appears as part of a regularization term in the loss function of the LSTM. While learning, the model is encouraged to both fit the data and to produce physically meaningful outputs. To deal with the high variation observed in the data, a mechanism for the uncertainty quantification of the estimates based on a recently discovered relation between Monte-Carlo dropout and Bayesian learning is used. The physics-based population model and the LSTM are trained and tested using controlled laboratory collected breath and transdermal alcohol data collected in four sessions from 40 orally dosed participants (50% female, ages 21 - 33 years, 35% BMI above 25.0) resulting in 256 usable drinking episodes partitioned into training and testing sets. Body measurement (e.g. BMI, hip to waist ratio, etc.), personal (e.g. sex, age, race, etc.), drinking behavior (e.g. frequent, rarely, etc.), and environmental (e.g. temperature, humidity, etc.) covariates were also collected from participants. The importance of various covariates in the estimation is investigated using Shapley values. It is shown that the physics-informed LSTM network can be successfully applied to drinking episodes from both the training and test set, and that the physics-based information leads to better generalization ability on new drinking episodes with the uncertainty quantification yielding credible bands that effectively capture the true signal. Compared to two machine learning models from previous studies, the proposed model reduces relative error in estimated breath alcohol concentration by 58% and 72%, and relative peak error by 33% and 76%.
本文考虑了基于经皮酒精生物传感器数据估算呼气酒精浓度的问题。经皮酒精浓度为诸如呼气酒精测试仪或饮酒日记等传统方法提供了一种有前景的替代方案。本文开发了一种带有协变量的物理信息长短期记忆(LSTM)网络来解决该估算问题。LSTM的数据驱动特性通过基于第一原理物理的群体模型得到增强,该模型用于描述乙醇在皮肤表皮层中的扩散。抽象抛物框架下的群体模型作为正则化项出现在LSTM损失函数中。在学习过程中,该模型既要拟合数据,又要产生具有物理意义的输出。为了处理数据中观察到的高变异性,本文使用了一种基于蒙特卡洛随机失活与贝叶斯学习之间新发现关系的估算不确定性量化机制。基于物理的群体模型和LSTM使用在实验室控制条件下收集的呼气数据以及从40名口服给药参与者(50%为女性,年龄在21 - 33岁之间,35%的体重指数高于25.0)的四个时间段收集的经皮酒精数据进行训练和测试,从而得到256个可用的饮酒事件,并将其划分为训练集和测试集。还从参与者那里收集了身体测量数据(如体重指数、腰臀比等)、个人数据(如性别、年龄、种族等)、饮酒行为数据(如频繁、偶尔等)以及环境数据(如温度、湿度等)协变量。使用Shapley值研究了各种协变量在估算中的重要性。结果表明,物理信息LSTM网络可以成功应用于训练集和测试集的饮酒事件,并且基于物理的信息能够在新的饮酒事件上带来更好的泛化能力,不确定性量化产生的可信区间能够有效地捕捉真实信号。与先前研究中的两个机器学习模型相比,所提出的模型将估算呼气酒精浓度的相对误差降低了58%和72%,相对峰值误差降低了33%和76%。