Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Division of Psychiatry, University of Edinburgh, Kennedy Tower, Royal Edinburgh Hospital, Edinburgh, UK.
Hum Brain Mapp. 2022 Dec 1;43(17):5126-5140. doi: 10.1002/hbm.26010. Epub 2022 Jul 19.
Application of machine learning (ML) algorithms to structural magnetic resonance imaging (sMRI) data has yielded behaviorally meaningful estimates of the biological age of the brain (brain-age). The choice of the ML approach in estimating brain-age in youth is important because age-related brain changes in this age-group are dynamic. However, the comparative performance of the available ML algorithms has not been systematically appraised. To address this gap, the present study evaluated the accuracy (mean absolute error [MAE]) and computational efficiency of 21 machine learning algorithms using sMRI data from 2105 typically developing individuals aged 5-22 years from five cohorts. The trained models were then tested in two independent holdout datasets, one comprising 4078 individuals aged 9-10 years and another comprising 594 individuals aged 5-21 years. The algorithms encompassed parametric and nonparametric, Bayesian, linear and nonlinear, tree-based, and kernel-based models. Sensitivity analyses were performed for parcellation scheme, number of neuroimaging input features, number of cross-validation folds, number of extreme outliers, and sample size. Tree-based models and algorithms with a nonlinear kernel performed comparably well, with the latter being especially computationally efficient. Extreme Gradient Boosting (MAE of 1.49 years), Random Forest Regression (MAE of 1.58 years), and Support Vector Regression (SVR) with Radial Basis Function (RBF) Kernel (MAE of 1.64 years) emerged as the three most accurate models. Linear algorithms, with the exception of Elastic Net Regression, performed poorly. Findings of the present study could be used as a guide for optimizing methodology when quantifying brain-age in youth.
机器学习(ML)算法在结构磁共振成像(sMRI)数据中的应用,已经可以对大脑的生物年龄(脑龄)进行具有行为意义的估计。在估计年轻人的脑龄时,选择 ML 方法非常重要,因为这个年龄段的大脑与年龄相关的变化是动态的。然而,现有的 ML 算法的性能比较尚未得到系统评估。为了解决这一差距,本研究使用来自五个队列的 2105 名 5-22 岁的典型发育个体的 sMRI 数据,评估了 21 种机器学习算法的准确性(平均绝对误差 [MAE])和计算效率。然后,将训练好的模型在两个独立的保留数据集上进行测试,一个数据集包含 4078 名 9-10 岁的个体,另一个数据集包含 594 名 5-21 岁的个体。这些算法包括参数和非参数、贝叶斯、线性和非线性、基于树和基于核的模型。进行了敏感性分析,包括分割方案、神经影像学输入特征的数量、交叉验证折叠的数量、极端异常值的数量和样本量。基于树的模型和具有非线性核的算法表现相当好,后者尤其具有计算效率。极端梯度提升(MAE 为 1.49 岁)、随机森林回归(MAE 为 1.58 岁)和支持向量回归(SVR)与径向基函数(RBF)核(MAE 为 1.64 岁)是表现最准确的三个模型。线性算法,除了弹性网络回归,表现不佳。本研究的发现可以作为优化方法的指南,用于在年轻人中量化脑龄。