Mathematical Institute, University of Oxford, Oxford, United Kingdom.
Department of Psychiatry, University of Oxford, Oxford, United Kingdom.
PLoS One. 2019 Feb 14;14(2):e0211558. doi: 10.1371/journal.pone.0211558. eCollection 2019.
Time-dependent data collected in studies of Alzheimer's disease usually has missing and irregularly sampled data points. For this reason time series methods which assume regular sampling cannot be applied directly to the data without a pre-processing step. In this paper we use a random forest to learn the relationship between pairs of data points at different time separations. The input vector is a summary of the time series history and it includes both demographic and non-time varying variables such as genetic data. To test the method we use data from the TADPOLE grand challenge, an initiative which aims to predict the evolution of subjects at risk of Alzheimer's disease using demographic, physical and cognitive input data. The task is to predict diagnosis, ADAS-13 score and normalised ventricles volume. While the competition proceeds, forecasting methods may be compared using a leaderboard dataset selected from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and with standard metrics for measuring accuracy. For diagnosis, we find an mAUC of 0.82, and a classification accuracy of 0.73 compared with a benchmark SVM predictor which gives mAUC = 0.62 and BCA = 0.52. The results show that the method is effective and comparable with other methods.
在阿尔茨海默病研究中收集的时间相关数据通常具有缺失和不规则采样的数据点。出于这个原因,假设规则采样的时间序列方法不能在没有预处理步骤的情况下直接应用于数据。在本文中,我们使用随机森林来学习不同时间间隔的两个数据点之间的关系。输入向量是时间序列历史的摘要,它包括人口统计学和非时变变量,如遗传数据。为了测试该方法,我们使用来自 TADPOLE 大挑战的数据,该倡议旨在使用人口统计学、身体和认知输入数据预测阿尔茨海默病风险患者的演变。任务是预测诊断、ADAS-13 评分和归一化脑室体积。在比赛进行期间,可以使用从阿尔茨海默病神经影像学倡议 (ADNI) 中选择的排行榜数据集以及用于衡量准确性的标准指标来比较预测方法。对于诊断,我们发现 mAUC 为 0.82,与基准 SVM 预测器相比,分类准确率为 0.73,后者的 mAUC = 0.62 和 BCA = 0.52。结果表明,该方法是有效的,并且可以与其他方法相媲美。