Institute of Neurosciences, University of Barcelona, Barcelona, Catalonia, Spain.
Medical Psychology Unit, Department of Medicine, University of Barcelona, Barcelona, Catalonia, Spain.
Hum Brain Mapp. 2022 Jul;43(10):3130-3142. doi: 10.1002/hbm.25838. Epub 2022 Mar 19.
Multi-site MRI datasets are crucial for big data research. However, neuroimaging studies must face the batch effect. Here, we propose an approach that uses the predictive probabilities provided by Gaussian processes (GPs) to harmonize clinical-based studies. A multi-site dataset of 216 Parkinson's disease (PD) patients and 87 healthy subjects (HS) was used. We performed a site GP classification using MRI data. The outcomes estimated from this classification, redefined like Weighted HARMonization PArameters (WHARMPA), were used as regressors in two different clinical studies: A PD versus HS machine learning classification using GP, and a VBM comparison (FWE-p < .05, k = 100). Same studies were also conducted using conventional Boolean site covariates, and without information about site belonging. The results from site GP classification provided high scores, balanced accuracy (BAC) was 98.39% for grey matter images. PD versus HS classification performed better when the WHARMPA were used to harmonize (BAC = 78.60%; AUC = 0.90) than when using the Boolean site information (BAC = 56.31%; AUC = 0.71) and without it (BAC = 57.22%; AUC = 0.73). The VBM analysis harmonized using WHARMPA provided larger and more statistically robust clusters in regions previously reported in PD than when the Boolean site covariates or no corrections were added to the model. In conclusion, WHARMPA might encode global site-effects quantitatively and allow the harmonization of data. This method is user-friendly and provides a powerful solution, without complex implementations, to clean the analyses by removing variability associated with the differences between sites.
多站点 MRI 数据集对于大数据研究至关重要。然而,神经影像学研究必须面对批次效应。在这里,我们提出了一种使用高斯过程 (GP) 提供的预测概率来协调基于临床的研究的方法。使用了一个包含 216 名帕金森病 (PD) 患者和 87 名健康对照者 (HS) 的多站点数据集。我们使用 MRI 数据进行了站点 GP 分类。从该分类中估计的结果,重新定义为加权协调参数 (WHARMPA),被用作两个不同临床研究的回归变量:使用 GP 的 PD 与 HS 机器学习分类,以及 VBM 比较(FWE-p < 0.05,k = 100)。同样的研究也使用传统的布尔站点协变量进行,并且没有关于站点归属的信息。站点 GP 分类的结果提供了高分数,灰质图像的平衡准确性 (BAC) 为 98.39%。当使用 WHARMPA 来协调时,PD 与 HS 分类的性能更好(BAC = 78.60%;AUC = 0.90),而不是使用布尔站点信息(BAC = 56.31%;AUC = 0.71)和没有信息(BAC = 57.22%;AUC = 0.73)。使用 WHARMPA 协调的 VBM 分析在以前报道的 PD 区域提供了更大和更具统计学意义的集群,而不是在模型中添加布尔站点协变量或不进行校正时。总之,WHARMPA 可以定量地编码全局站点效应,并允许数据的协调。这种方法易于使用,并且提供了一种强大的解决方案,无需复杂的实现,即可通过去除与站点差异相关的变异性来清理分析。