Bansal Ravi, Peterson Bradley S
Institute for the Developing Mind, Children's Hospital Los Angeles, CA 90027, USA; Department of Pediatrics, Keck School of Medicine at the University of Southern California, Los Angeles, CA 90033, USA.
Institute for the Developing Mind, Children's Hospital Los Angeles, CA 90027, USA; Department of Psychiatry, Keck School of Medicine at the University of Southern California, Los Angeles, CA 90033, USA.
Magn Reson Imaging. 2018 Jun;49:101-115. doi: 10.1016/j.mri.2018.01.004. Epub 2018 Feb 3.
Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques.
在磁共振成像(MRI)数据集中识别感兴趣的区域效应通常需要在数以千计的脑体素上检验先验假设,这就需要在这些多重假设检验中控制假阳性结果。最近的研究表明,参数统计方法可能对功能MRI数据进行了错误建模,从而导致假阳性率高于其标称率。相比之下,在进行多重统计检验时,用于统计推断的非参数方法被认为以标称率产生假阳性,因此有人建议以前报道的研究应该使用非参数工具重新分析其功能磁共振成像数据。为了更好地理解为什么参数方法可能会产生过多的假阳性,我们评估了它们在应用于一维、二维和三维高斯随机场(GRF)的模拟数据集以及710个真实世界的静息态功能磁共振成像数据集时的性能。我们发现,模拟的二维和三维GRF以及真实世界的数据中都包含一小部分(<6%)非常大的簇(平均比平均簇大小大60倍),而一维GRF中不存在这些簇。使用参数方法时,这些意外出现的大簇被认为具有统计学意义,导致经验性家族性错误率(FWER)高达65%:高经验性FWER不是参数方法未能准确模拟空间平滑性的结果,而是这些非常大的簇本身存在于平滑的高维随机场中。事实上,当不考虑这些非常大的簇时,参数方法的经验性FWER为3.24%。此外,即使经验性FWER为65%,在每次全脑分析中平均也只会产生不到一个这样的非常大的簇。相比之下,非参数方法从这些大簇估计分布,因此,通过构建在标称FWER下将大簇作为假阳性拒绝。那些被拒绝的簇是簇大小分布中的异常值,但如果没有进一步分析,包括评估这些区域的功能磁共振成像信号是否与其他临床、行为或认知测量相关,就无法与真阳性结果区分开来。然而,与参数方法相比,拒绝大簇显著降低了非参数方法检测真发现的统计功效,参数方法本可以检测到大多数对在MRI数据中进行有效的生物学推断至关重要的真发现。相比之下,参数分析检测到了大多数真发现,同时产生的假阳性相对较少:在每次全脑分析中,平均不到一个这样的非常大的簇会被视为真发现。因此,我们建议继续使用对非平稳平滑性进行建模的参数方法,以进行簇水平的家族性假阳性控制,特别是当使用2.5或更高的簇定义阈值时,随后严格评估结果的生物学合理性,即使对于大簇也是如此。最后,由于非参数方法在检测真阳性发现方面的统计功效大幅降低,我们得出结论,非参数分析在假阳性发现方面的适度减少并不足以保证使用非参数技术对以前发表的功能磁共振成像研究进行重新分析。