Department of Biomedical Engineering, National University of Singapore, Singapore.
Department of Biomedical Engineering, National University of Singapore, Singapore; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore.
Neuroimage. 2018 Jun;173:57-71. doi: 10.1016/j.neuroimage.2018.01.073. Epub 2018 Feb 12.
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html.
神经影像学数据的统计推断通常使用多元整体模型进行,相当于在每个体素上拟合一个具有已知协变量集的线性模型。由于线性模型数量庞大,检查协变量的选择是否合适并进行适当的修改具有挑战性。对于神经影像学数据,使用标准诊断方法(如残差绘图)显然不切实际。然而,对于线性回归来说,协变量的选择对于确保有效的统计推断至关重要。特别是,回归的均值模型需要合理地明确规定。不幸的是,这个问题在神经影像学领域经常被忽视。本研究旨在采用现有的混杂因素调整测试和估计(CATE)方法,并将其扩展用于神经影像学数据。我们提出了一种 CATE 的修改方法,该方法可以使用主成分分析(PCA)估计器而不是最大似然(ML)估计器来进行有效的统计推断。然后,我们提出了一种非参数假设检验程序,可以改进参数检验。蒙特卡罗模拟表明,CATE 的修改可以更准确地建模神经影像学数据,从而更好地控制假阳性率(FPR)和全错误率(FWER)。我们将其应用于新生儿脑成像和脐带 DNA 甲基化数据的全基因组关联研究(EWAS),这些数据是作为纵向队列研究的一部分获得的。这个 CATE 研究的软件可在 http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html 免费获取。