Radboud University, ICIS, Nijmegen, The Netherlands.
University of Milano-Bicocca, Italy.
Artif Intell Med. 2019 Apr;95:104-117. doi: 10.1016/j.artmed.2018.10.002. Epub 2019 Jan 22.
Recently, mobile devices, such as smartphones, have been introduced into healthcare research to substitute paper diaries as data-collection tools in the home environment. Such devices support collecting patient data at different time points over a long period, resulting in clinical time-series data with high temporal complexity, such as time irregularities. Analysis of such time series poses new challenges for machine-learning techniques. The clinical context for the research discussed in this paper is home monitoring in chronic obstructive pulmonary disease (COPD).
The goal of the present research is to find out which properties of temporal Bayesian network models allow to cope best with irregularly spaced multivariate clinical time-series data.
Two mainstream temporal Bayesian network models of multivariate clinical time series are studied: dynamic Bayesian networks, where the system is described as a snapshot at discrete time points, and continuous time Bayesian networks, where transitions between states are modeled in continuous time. Their capability of learning from clinical time series that vary in nature are extensively studied. In order to compare the two temporal Bayesian network types for regularly and irregularly spaced time-series data, three typical ways of observing time-series data were investigated: (1) regularly spaced in time with a fixed rate; (2) irregularly spaced and missing completely at random at discrete time points; (3) irregularly spaced and missing at random at discrete time points. In addition, similar experiments were carried out using real-world COPD patient data where observations are unevenly spaced.
For regularly spaced time series, the dynamic Bayesian network models outperform the continuous time Bayesian networks. Similarly, if the data is missing completely at random, discrete-time models outperform continuous time models in most situations. For more realistic settings where data is not missing completely at random, the situation is more complicated. In simulation experiments, both models perform similarly if there is strong prior knowledge available about the missing data distribution. Otherwise, continuous time Bayesian networks perform better. In experiments with unevenly spaced real-world data, we surprisingly found that a dynamic Bayesian network where time is ignored performs similar to a continuous time Bayesian network.
The results confirm conventional wisdom that discrete-time Bayesian networks are appropriate when learning from regularly spaced clinical time series. Similarly, we found that time series where the missingness occurs completely at random, dynamic Bayesian networks are an appropriate choice. However, for complex clinical time-series data that motivated this research, the continuous-time models are at least competitive and sometimes better than their discrete-time counterparts. Furthermore, continuous-time models provide additional benefits of being able to provide more fine-grained predictions than discrete-time models, which will be of practical relevance in clinical applications.
最近,移动设备(如智能手机)已被引入医疗保健研究中,以替代纸质日记作为家庭环境中的数据收集工具。这些设备支持在很长一段时间内以不同的时间点收集患者数据,从而产生具有高度时间复杂性的临床时间序列数据,例如时间不规则性。对这些时间序列的分析为机器学习技术带来了新的挑战。本文讨论的研究的临床背景是慢性阻塞性肺疾病(COPD)的家庭监测。
本研究的目标是找出时间贝叶斯网络模型的哪些特性可以最好地应对不规则间隔的多变量临床时间序列数据。
研究了两种主流的多变量临床时间序列时间贝叶斯网络模型:动态贝叶斯网络,其中系统在离散时间点被描述为快照,以及连续时间贝叶斯网络,其中状态之间的转换在连续时间中建模。广泛研究了它们从本质上变化的临床时间序列中学习的能力。为了比较两种时间贝叶斯网络类型用于规则和不规则间隔的时间序列数据,研究了三种观察时间序列数据的典型方法:(1)以固定速率规则地间隔时间;(2)在离散时间点完全随机且不规则地间隔和缺失;(3)在离散时间点随机且不规则地间隔和缺失。此外,还使用 COPD 患者的真实世界数据进行了类似的实验,其中观察值的间隔不均匀。
对于规则间隔的时间序列,动态贝叶斯网络模型优于连续时间贝叶斯网络。同样,如果数据完全随机缺失,在大多数情况下,离散时间模型优于连续时间模型。对于更现实的数据缺失不是完全随机的情况,情况更加复杂。在模拟实验中,如果有关于缺失数据分布的先验知识,则两种模型的性能相似。否则,连续时间贝叶斯网络的性能更好。在使用不均匀间隔的真实世界数据的实验中,我们令人惊讶地发现,忽略时间的动态贝叶斯网络的性能与连续时间贝叶斯网络相似。
结果证实了传统观点,即当从规则间隔的临床时间序列中学习时,离散时间贝叶斯网络是合适的。同样,我们发现缺失完全随机发生的时间序列,动态贝叶斯网络是一个合适的选择。然而,对于激发本研究的复杂临床时间序列数据,连续时间模型至少具有竞争力,并且在某些情况下比其离散时间对应物更好。此外,连续时间模型提供了更精细粒度预测的额外好处,这在临床应用中具有实际意义。