Ahmad Alwani Liyana, Sanchez-Bornot Jose M, Sotero Roberto C, Coyle Damien, Idris Zamzuri, Faye Ibrahima
Department of Fundamental and Applied Sciences, Faculty of Science and Information Technology, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, Malaysia.
Department of Neurosciences, Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia.
PeerJ. 2024 Dec 13;12:e18490. doi: 10.7717/peerj.18490. eCollection 2024.
Alzheimer's Disease (AD) poses a major challenge as a neurodegenerative disorder, and early detection is critical for effective intervention. Magnetic resonance imaging (MRI) is a critical tool in AD research due to its availability and cost-effectiveness in clinical settings.
This study aims to conduct a comprehensive analysis of machine learning (ML) methods for MRI-based biomarker selection and classification to investigate early cognitive decline in AD. The focus to discriminate between classifying healthy control (HC) participants who remained stable and those who developed mild cognitive impairment (MCI) within five years (unstable HC or uHC).
3-Tesla (3T) MRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) and Open Access Series of Imaging Studies 3 (OASIS-3) were used, focusing on HC and uHC groups. Freesurfer's recon-all and other tools were used to extract anatomical biomarkers from subcortical and cortical brain regions. ML techniques were applied for feature selection and classification, using the MATLAB Classification Learner (MCL) app for initial analysis, followed by advanced methods such as nested cross-validation and Bayesian optimization, which were evaluated within a Monte Carlo replication analysis as implemented in our customized pipeline. Additionally, polynomial regression-based data harmonization techniques were used to enhance ML and statistical analysis. In our study, ML classifiers were evaluated using performance metrics such as Accuracy (Acc), area under the receiver operating characteristic curve (AROC), F1-score, and a normalized Matthew's correlation coefficient (MCC').
Feature selection consistently identified biomarkers across ADNI and OASIS-3, with the entorhinal, hippocampus, lateral ventricle, and lateral orbitofrontal regions being the most affected. Classification results varied between balanced and imbalanced datasets and between ADNI and OASIS-3. For ADNI balanced datasets, the naíve Bayes model using -score harmonization and ReliefF feature selection performed best (Acc = 69.17%, AROC = 77.73%, F1 = 69.21%, MCC' = 69.28%). For OASIS-3 balanced datasets, SVM with zscore-corrected data outperformed others (Acc = 66.58%, AROC = 72.01%, MCC' = 66.78%), while logistic regression had the best F1-score (66.68%). In imbalanced data, RUSBoost showed the strongest overall performance on ADNI (F1 = 50.60%, AROC = 81.54%) and OASIS-3 (MCC' = 63.31%). Support vector machine (SVM) excelled on ADNI in terms of Acc (82.93%) and MCC' (70.21%), while naïve Bayes performed best on OASIS-3 by F1 (42.54%) and AROC (70.33%).
Data harmonization significantly improved the consistency and performance of feature selection and ML classification, with -score harmonization yielding the best results. This study also highlights the importance of nested cross-validation (CV) to control overfitting and the potential of a semi-automatic pipeline for early AD detection using MRI, with future applications integrating other neuroimaging data to enhance prediction.
阿尔茨海默病(AD)作为一种神经退行性疾病带来了重大挑战,早期检测对于有效干预至关重要。磁共振成像(MRI)因其在临床环境中的可用性和成本效益,成为AD研究中的关键工具。
本研究旨在对基于MRI的生物标志物选择和分类的机器学习(ML)方法进行全面分析,以研究AD中的早期认知衰退。重点是区分保持稳定的健康对照(HC)参与者和在五年内发展为轻度认知障碍(MCI)的参与者(不稳定HC或uHC)。
使用来自阿尔茨海默病神经成像倡议(ADNI)和开放获取影像研究系列3(OASIS - 3)的3特斯拉(3T)MRI数据,重点关注HC和uHC组。使用Freesurfer的recon - all和其他工具从皮质下和皮质脑区提取解剖学生物标志物。应用ML技术进行特征选择和分类,使用MATLAB分类学习器(MCL)应用程序进行初步分析,随后采用诸如嵌套交叉验证和贝叶斯优化等先进方法,并在我们定制的管道中实施的蒙特卡罗复制分析中进行评估。此外,基于多项式回归的数据协调技术用于增强ML和统计分析。在我们的研究中,使用诸如准确率(Acc)、受试者工作特征曲线下面积(AROC)、F1分数和归一化马修斯相关系数(MCC')等性能指标评估ML分类器。
特征选择在ADNI和OASIS - 3中一致地识别出生物标志物,内嗅区、海马体、侧脑室和外侧眶额区受影响最大。分类结果在平衡和不平衡数据集之间以及ADNI和OASIS - 3之间有所不同。对于ADNI平衡数据集,使用 - 分数协调和ReliefF特征选择的朴素贝叶斯模型表现最佳(Acc = 69.17%,AROC = 77.73%,F1 = 69.21%,MCC' = 69.28%)。对于OASIS - 3平衡数据集,经z分数校正数据的支持向量机(SVM)表现优于其他方法(Acc = 66.58%,AROC = 72.01%,MCC' = 66.78%),而逻辑回归的F1分数最佳(66.68%)。在不平衡数据中,RUSBoost在ADNI(F1 = 50.60%,AROC = 81.54%)和OASIS - 3(MCC' = 63.31%)上总体表现最强。支持向量机(SVM)在ADNI上的Acc(82.93%)和MCC'(70.21%)方面表现出色,而朴素贝叶斯在OASIS - 3上的F1(42.54%)和AROC(70.33%)方面表现最佳。
数据协调显著提高了特征选择和ML分类的一致性和性能, - 分数协调产生了最佳结果。本研究还强调了嵌套交叉验证(CV)对控制过拟合的重要性以及使用MRI进行早期AD检测的半自动管道的潜力,未来应用将整合其他神经影像数据以增强预测。