Easley Ty, Chen Ruiqi, Hannon Kayla, Dutt Rosie, Bijsterbosch Janine
Department of Radiology, Washington University School of Medicine, Saint Louis, Missouri, 63110, USA.
Division of Biology and Biomedical Sciences, Washington University in St. Louis, Saint Louis, Missouri, 63110, USA.
Neuroimage Rep. 2023 Mar 13;3(2):100163. doi: 10.1016/j.ynirp.2023.100163. eCollection 2023 Jun.
Efforts to predict trait phenotypes based on functional MRI data from large cohorts have been hampered by low prediction accuracy and/or small effect sizes. Although these findings are highly replicable, the small effect sizes are somewhat surprising given the presumed brain basis of phenotypic traits such as neuroticism and fluid intelligence. We aim to replicate previous work and additionally test multiple data manipulations that may improve prediction accuracy by addressing data pollution challenges. Specifically, we added additional fMRI features, averaged the target phenotype across multiple measurements to obtain more accurate estimates of the underlying trait, balanced the target phenotype's distribution through undersampling of majority scores, and identified data-driven subtypes to investigate the impact of between-participant heterogeneity. Our results replicated prior results from Dadi et al. (2021) in a larger sample. Each data manipulation further led to small but consistent improvements in prediction accuracy, which were largely additive when combining multiple data manipulations. Combining data manipulations (i.e., extended fMRI features, averaged target phenotype, balanced target phenotype distribution) led to a three-fold increase in prediction accuracy for fluid intelligence compared to prior work. These findings highlight the benefit of several relatively easy and low-cost data manipulations, which may positively impact future work.
基于来自大型队列的功能磁共振成像(fMRI)数据预测特质表型的努力一直受到预测准确性低和/或效应量小的阻碍。尽管这些发现具有高度可重复性,但考虑到诸如神经质和流体智力等表型特质的假定脑基础,效应量小还是有点令人惊讶。我们旨在重复先前的工作,并额外测试多种数据处理方法,这些方法可能通过应对数据污染挑战来提高预测准确性。具体而言,我们添加了额外的fMRI特征,对多个测量中的目标表型进行平均以获得潜在特质的更准确估计,通过对多数分数进行欠采样来平衡目标表型的分布,并识别数据驱动的亚型以研究个体间异质性的影响。我们的结果在更大的样本中重复了达迪等人(2021年)先前的结果。每种数据处理方法都进一步使预测准确性有了虽小但一致的提高,在组合多种数据处理方法时,这些提高在很大程度上是累加的。与先前的工作相比,组合数据处理方法(即扩展的fMRI特征、平均目标表型、平衡目标表型分布)使流体智力的预测准确性提高了两倍。这些发现凸显了几种相对简单且低成本的数据处理方法的益处,这可能对未来的工作产生积极影响。