School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia.
School of Psychiatry, UNSW Sydney, Sydney, Australia.
Sci Rep. 2020 Nov 23;10(1):20410. doi: 10.1038/s41598-020-77220-w.
Data collected from clinical trials and cohort studies, such as dementia studies, are often high-dimensional, censored, heterogeneous and contain missing information, presenting challenges to traditional statistical analysis. There is an urgent need for methods that can overcome these challenges to model this complex data. At present there is no cure for dementia and no treatment that can successfully change the course of the disease. Machine learning models that can predict the time until a patient develops dementia are important tools in helping understand dementia risks and can give more accurate results than traditional statistical methods when modelling high-dimensional, heterogeneous, clinical data. This work compares the performance and stability of ten machine learning algorithms, combined with eight feature selection methods, capable of performing survival analysis of high-dimensional, heterogeneous, clinical data. We developed models that predict survival to dementia using baseline data from two different studies. The Sydney Memory and Ageing Study (MAS) is a longitudinal cohort study of 1037 participants, aged 70-90 years, that aims to determine the effects of ageing on cognition. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal study aimed at identifying biomarkers for the early detection and tracking of Alzheimer's disease. Using the concordance index as a measure of performance, our models achieve maximum performance values of 0.82 for MAS and 0.93 For ADNI.
从临床试验和队列研究中收集的数据,如痴呆症研究,通常具有高维性、删失性、异质性和包含缺失信息,这给传统的统计分析带来了挑战。目前,痴呆症尚无治愈方法,也没有能够成功改变疾病进程的治疗方法。能够预测患者发生痴呆症时间的机器学习模型是帮助了解痴呆症风险的重要工具,并且在对高维、异质的临床数据进行建模时,其结果比传统统计方法更准确。本研究比较了十种机器学习算法与八种特征选择方法相结合的性能和稳定性,这些方法能够对高维、异质的临床数据进行生存分析。我们使用来自两项不同研究的基线数据开发了预测痴呆症生存的模型。悉尼记忆与衰老研究(MAS)是一项针对 1037 名 70-90 岁参与者的纵向队列研究,旨在确定衰老对认知的影响。阿尔茨海默病神经影像学倡议(ADNI)是一项旨在确定早期检测和跟踪阿尔茨海默病的生物标志物的纵向研究。使用一致性指数作为性能的衡量标准,我们的模型在 MAS 中达到了 0.82 的最大性能值,在 ADNI 中达到了 0.93。