Orlichenko Anton, Qu Gang, Zhou Ziyu, Liu Anqi, Deng Hong-Wen, Ding Zhengming, Stephen Julia M, Wilson Tony W, Calhoun Vince D, Wang Yu-Ping
bioRxiv. 2024 May 16:2024.05.16.594528. doi: 10.1101/2024.05.16.594528.
fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging.
We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevel-opmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP).
We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity.
Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics.
Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.
功能磁共振成像(fMRI)以及诸如功能连接性(FC)等派生测量方法已被用于预测脑龄、一般流体智力、精神疾病状态和临床前神经退行性疾病。然而,目前尚不清楚是否已从fMRI数据中消除了所有人口统计学混杂因素,如年龄、性别和种族。此外,许多fMRI数据集仅限于授权研究人员使用,这使得传播这些宝贵的数据源具有挑战性。
我们创建了一个基于变分自编码器(VAE)的模型DemoVAE,以消除fMRI特征与人口统计学之间的相关性,并根据用户提供的人口统计学信息生成高质量的合成fMRI数据。我们使用两个广泛使用的大型数据集——费城神经发育队列(PNC)和双相情感障碍与精神分裂症中间表型网络(BSNIP)来训练和验证我们的模型。
我们发现DemoVAE在捕捉个体变异的全部范围的同时,概括了fMRI数据中的组间差异。值得注意的是,我们还发现,大多数与fMRI数据相关的临床和计算机化测试领域与DemoVAE潜在因素不相关。与精神分裂症药物治疗和症状严重程度相关的几个领域是个例外。
我们的模型生成的fMRI数据比传统的VAE或GAN模型能更好地捕捉FC的全部分布。我们还发现,大多数使用fMRI数据的预测依赖于与人口统计学的相关性以及对人口统计学的预测。
我们的DemoVAE模型允许根据受试者人口统计学生成高质量的合成数据,并消除人口统计学的混杂效应。我们发现基于FC的预测任务受到人口统计学混杂因素的高度影响。