Department of Biomedical Informatics, Columbia University, 622 West 168th Street, PH-20, New York, NY, USA; Department of Pediatrics, Division of Informatics, University of Colorado Medicine, Mail: F443, 13199 E. Montview Blvd. Ste: 210-12 | Aurora, CO 80045 USA.
Department of Computational and Mathematical sciences, California Institute of Technology, 1200 E California Blvd M/C 305-16 Pasadena, CA 91125 USA.
Math Biosci. 2019 Oct;316:108242. doi: 10.1016/j.mbs.2019.108242. Epub 2019 Aug 24.
One way to interject knowledge into clinically impactful forecasting is to use data assimilation, a nonlinear regression that projects data onto a mechanistic physiologic model, instead of a set of functions, such as neural networks. Such regressions have an advantage of being useful with particularly sparse, non-stationary clinical data. However, physiological models are often nonlinear and can have many parameters, leading to potential problems with parameter identifiability, or the ability to find a unique set of parameters that minimize forecasting error. The identifiability problems can be minimized or eliminated by reducing the number of parameters estimated, but reducing the number of estimated parameters also reduces the flexibility of the model and hence increases forecasting error. We propose a method, the parameter Houlihan, that combines traditional machine learning techniques with data assimilation, to select the right set of model parameters to minimize forecasting error while reducing identifiability problems. The method worked well: the data assimilation-based glucose forecasts and estimates for our cohort using the Houlihan-selected parameter sets generally also minimize forecasting errors compared to other parameter selection methods such as by-hand parameter selection. Nevertheless, the forecast with the lowest forecast error does not always accurately represent physiology, but further advancements of the algorithm provide a path for improving physiologic fidelity as well. Our hope is that this methodology represents a first step toward combining machine learning with data assimilation and provides a lower-threshold entry point for using data assimilation with clinical data by helping select the right parameters to estimate.
将知识注入具有临床影响力的预测的一种方法是使用数据同化,这是一种将数据投影到机械生理模型而不是一组函数(例如神经网络)上的非线性回归。这种回归的优点是对于特别稀疏、非平稳的临床数据非常有用。然而,生理模型通常是非线性的,并且可能具有许多参数,从而导致参数可识别性或找到一组可最小化预测误差的唯一参数的能力出现问题。通过减少估计的参数数量,可以最小化或消除可识别性问题,但减少估计的参数数量也会降低模型的灵活性,从而增加预测误差。我们提出了一种方法,即参数 Houlihan,它将传统机器学习技术与数据同化相结合,选择正确的模型参数集,以最小化预测误差,同时减少可识别性问题。该方法效果很好:使用 Houlihan 选择的参数集进行基于数据同化的葡萄糖预测和估计,与手动选择参数等其他参数选择方法相比,通常也可以最小化预测误差。然而,具有最低预测误差的预测并不总是准确地代表生理学,但是算法的进一步改进为提高生理逼真度提供了途径。我们希望这种方法代表了将机器学习与数据同化相结合的第一步,并通过帮助选择要估计的正确参数,为使用数据同化与临床数据提供了一个较低的切入点。