Urteaga Iñigo, Li Kathy, Shea Amanda, Vitzthum Virginia J, Wiggins Chris H, Elhadad Noémie
Department of Applied Physics and Applied Mathematics, Data Science Institute Columbia University, New York, NY, USA.
Clue by BioWink, Adalbertstraße 7-8, 10999 Berlin, Germany.
Proc Mach Learn Res. 2021 Aug;149:535-566.
We explore how to quantify uncertainty when designing predictive models for healthcare to provide well-calibrated results. Uncertainty quantification and calibration are critical in medicine, as one must not only accommodate the variability of the underlying physiology, but adjust to the uncertain data collection and reporting process. This occurs not only on the context of electronic health records (i.e., the clinical documentation process), but on mobile health as well (i.e., user specific self-tracking patterns must be accounted for). In this work, we show that accurate uncertainty estimation is directly relevant to an important health application: the prediction of menstrual cycle length, based on self-tracked information. We take advantage of a flexible generative model that accommodates under-dispersed distributions via two degrees of freedom to fit the mean and variance of the observed cycle lengths. From a machine learning perspective, our work showcases how flexible generative models can not only provide state-of-the art predictive accuracy, but enable well-calibrated predictions. From a healthcare perspective, we demonstrate that with flexible generative models, not only can we accommodate the idiosyncrasies of mobile health data, but we can also adjust the predictive uncertainty to per-user cycle length patterns. We evaluate the proposed model in real-world cycle length data collected by one of the most popular menstrual trackers worldwide, and demonstrate how the proposed generative model provides accurate and well-calibrated cycle length predictions. Providing meaningful, less uncertain cycle length predictions is beneficial for menstrual health researchers, mobile health users and developers, as it may help design more usable mobile health solutions.
我们探讨了在设计医疗保健预测模型时如何量化不确定性,以提供校准良好的结果。不确定性量化和校准在医学中至关重要,因为人们不仅必须适应潜在生理机能的变异性,还要适应不确定的数据收集和报告过程。这不仅发生在电子健康记录的背景下(即临床文档过程),在移动健康领域也同样存在(即必须考虑用户特定的自我跟踪模式)。在这项工作中,我们表明准确的不确定性估计与一项重要的健康应用直接相关:基于自我跟踪信息预测月经周期长度。我们利用了一种灵活的生成模型,该模型通过两个自由度来适应欠分散分布,以拟合观察到的周期长度的均值和方差。从机器学习的角度来看,我们的工作展示了灵活的生成模型不仅可以提供最先进的预测准确性,还能实现校准良好的预测。从医疗保健的角度来看,我们证明了使用灵活的生成模型,我们不仅可以适应移动健康数据的特性,还可以根据每个用户的周期长度模式调整预测不确定性。我们在全球最受欢迎的月经追踪器之一收集的真实世界周期长度数据中评估了所提出的模型,并展示了所提出的生成模型如何提供准确且校准良好的周期长度预测。提供有意义、不确定性较小的周期长度预测对月经健康研究人员、移动健康用户和开发者有益,因为这可能有助于设计更实用的移动健康解决方案。