Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, USA.
Evelyn F. McKnight Brain Institute, University of Miami, Miami, FL, USA.
Brain Imaging Behav. 2021 Jun;15(3):1270-1278. doi: 10.1007/s11682-020-00325-3.
High dimensional neuroimaging datasets and machine learning have been used to estimate and predict domain-specific cognition, but comparisons with simpler models composed of easy-to-measure variables are limited. Regularization methods in particular may help identify regions-of-interest related to domain-specific cognition. Using data from the Northern Manhattan Study, a cohort study of mostly Hispanic older adults, we compared three models estimating domain-specific cognitive performance: sociodemographics and APOE ε4 allele status (basic model), the basic model and MRI markers, and a model with only MRI markers. We used several machine learning methods to fit our regression models: elastic net, support vector regression, random forest, and principal components regression. Model performance was assessed with the RMSE, MAE, and R statistics using 5-fold cross-validation. To assess whether prediction models with imaging biomarkers were more predictive than prediction models built with randomly generated biomarkers, we refit the elastic net models using 1000 datasets with random biomarkers and compared the distribution of the RMSE and R in models using these random biomarkers to the RMSE and R from observed models. Basic models explained ~ 31-38% of the variance in domain-specific cognition. Addition of MRI markers did not improve estimation. However, elastic net models with only MRI markers performed significantly better than random MRI markers (one-sided P < .05) and yielded regions-of-interest consistent with previous literature and others not previously explored. Therefore, structural brain MRI markers may be more useful for etiological than predictive modeling.
高维神经影像学数据集和机器学习已被用于估计和预测特定领域的认知,但与由易于测量的变量组成的简单模型的比较有限。特别是正则化方法可能有助于识别与特定领域认知相关的感兴趣区域。使用来自北部曼哈顿研究的数据,这是一项针对大多数西班牙裔老年人的队列研究,我们比较了三种估计特定领域认知表现的模型:社会人口统计学和 APOE ε4 等位基因状态(基本模型)、基本模型和 MRI 标志物以及仅具有 MRI 标志物的模型。我们使用几种机器学习方法来拟合我们的回归模型:弹性网、支持向量回归、随机森林和主成分回归。使用 5 折交叉验证评估 RMSE、MAE 和 R 统计量来评估模型性能。为了评估是否具有成像生物标志物的预测模型比使用随机生成的生物标志物构建的预测模型更具预测性,我们使用 1000 个具有随机生物标志物的数据集重新拟合弹性网模型,并比较使用这些随机生物标志物的模型的 RMSE 和 R 的分布与从观察模型得到的 RMSE 和 R。基本模型解释了特定领域认知的约 31-38%的方差。添加 MRI 标志物并不能提高估计精度。然而,仅使用 MRI 标志物的弹性网模型的性能明显优于随机 MRI 标志物(单侧 P<0.05),并且产生了与先前文献一致的感兴趣区域和其他先前未探索的区域。因此,结构脑 MRI 标志物对于病因学建模可能比预测建模更有用。