CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China.
Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.
Hum Brain Mapp. 2018 Jan;39(1):300-318. doi: 10.1002/hbm.23843. Epub 2017 Oct 11.
Concerns regarding reproducibility of resting-state functional magnetic resonance imaging (R-fMRI) findings have been raised. Little is known about how to operationally define R-fMRI reproducibility and to what extent it is affected by multiple comparison correction strategies and sample size. We comprehensively assessed two aspects of reproducibility, test-retest reliability and replicability, on widely used R-fMRI metrics in both between-subject contrasts of sex differences and within-subject comparisons of eyes-open and eyes-closed (EOEC) conditions. We noted permutation test with Threshold-Free Cluster Enhancement (TFCE), a strict multiple comparison correction strategy, reached the best balance between family-wise error rate (under 5%) and test-retest reliability/replicability (e.g., 0.68 for test-retest reliability and 0.25 for replicability of amplitude of low-frequency fluctuations (ALFF) for between-subject sex differences, 0.49 for replicability of ALFF for within-subject EOEC differences). Although R-fMRI indices attained moderate reliabilities, they replicated poorly in distinct datasets (replicability < 0.3 for between-subject sex differences, < 0.5 for within-subject EOEC differences). By randomly drawing different sample sizes from a single site, we found reliability, sensitivity and positive predictive value (PPV) rose as sample size increased. Small sample sizes (e.g., < 80 [40 per group]) not only minimized power (sensitivity < 2%), but also decreased the likelihood that significant results reflect "true" effects (PPV < 0.26) in sex differences. Our findings have implications for how to select multiple comparison correction strategies and highlight the importance of sufficiently large sample sizes in R-fMRI studies to enhance reproducibility. Hum Brain Mapp 39:300-318, 2018. © 2017 Wiley Periodicals, Inc.
人们对静息态功能磁共振成像 (R-fMRI) 研究结果的可重复性提出了担忧。目前人们对于如何操作定义 R-fMRI 的可重复性,以及它在多大程度上受到多重比较校正策略和样本量的影响知之甚少。我们全面评估了两个方面的可重复性,即组间性别差异的测试-再测试可靠性和重复性,以及组内睁眼和闭眼(EOEC)条件的比较。我们注意到,使用置换检验和阈值自由聚类增强(TFCE)的方法,这种严格的多重比较校正策略,在组间性别差异的检验-再测试可靠性/可重复性之间达到了最佳平衡(例如,在组间性别差异的低频波动幅度(ALFF)的检验-再测试可靠性为 0.68,可重复性为 0.25,在组内 EOEC 差异的 ALFF 可重复性为 0.49)。尽管 R-fMRI 指数达到了中等可靠性,但在不同数据集的复制效果较差(组间性别差异的可重复性<0.3,组内 EOEC 差异的可重复性<0.5)。通过从单个站点随机抽取不同的样本量,我们发现可靠性、敏感性和阳性预测值(PPV)随着样本量的增加而增加。小样本量(例如,<80[每组 40 个])不仅降低了功效(敏感性<2%),而且降低了显著结果反映“真实”效应的可能性(PPV<0.26)。我们的研究结果对如何选择多重比较校正策略具有启示意义,并强调了在 R-fMRI 研究中使用足够大的样本量来提高可重复性的重要性。人类大脑图谱 39:300-318,2018。©2017 Wiley Periodicals, Inc.