Thirion Bertrand, Pinel Philippe, Mériaux Sébastien, Roche Alexis, Dehaene Stanislas, Poline Jean-Baptiste
INRIA Futurs, Service Hospitalier Frédéric Joliot, 4, Place du Général Leclerc, 91401 Orsay cedex, France.
Neuroimage. 2007 Mar;35(1):105-20. doi: 10.1016/j.neuroimage.2006.11.054. Epub 2007 Jan 18.
The aim of group fMRI studies is to relate contrasts of tasks or stimuli to regional brain activity increases. These studies typically involve 10 to 16 subjects. The average regional activity statistical significance is assessed using the subject to subject variability of the effect (random effects analyses). Because of the relatively small number of subjects included, the sensitivity and reliability of these analyses is questionable and hard to investigate. In this work, we use a very large number of subject (more than 80) to investigate this issue. We take advantage of this large cohort to study the statistical properties of the inter-subject activity and focus on the notion of reproducibility by bootstrapping. We asked simple but important methodological questions: Is there, from the point of view of reliability, an optimal statistical threshold for activity maps? How many subjects should be included in group studies? What method should be preferred for inference? Our results suggest that i) optimal thresholds can indeed be found, and are rather lower than usual corrected for multiple comparison thresholds, ii) 20 subjects or more should be included in functional neuroimaging studies in order to have sufficient reliability, iii) non-parametric significance assessment should be preferred to parametric methods, iv) cluster-level thresholding is more reliable than voxel-based thresholding, and v) mixed effects tests are much more reliable than random effects tests. Moreover, our study shows that inter-subject variability plays a prominent role in the relatively low sensitivity and reliability of group studies.
群体功能磁共振成像(fMRI)研究的目的是将任务或刺激的对比与局部脑活动增加联系起来。这些研究通常涉及10至16名受试者。使用效应的受试者间变异性(随机效应分析)来评估平均局部活动的统计显著性。由于纳入的受试者数量相对较少,这些分析的敏感性和可靠性存在疑问且难以研究。在这项工作中,我们使用了大量受试者(超过80名)来研究这个问题。我们利用这个大样本队列来研究受试者间活动的统计特性,并通过自抽样法关注可重复性的概念。我们提出了一些简单但重要的方法学问题:从可靠性的角度来看,活动图谱是否存在一个最优的统计阈值?群体研究应纳入多少受试者?哪种推理方法更可取?我们的结果表明:i)确实可以找到最优阈值,且该阈值通常低于用于多重比较校正的常规阈值;ii)为了获得足够的可靠性,功能神经成像研究应纳入20名或更多受试者;iii)非参数显著性评估应优于参数方法;iv)聚类水平阈值化比基于体素的阈值化更可靠;v)混合效应检验比随机效应检验可靠得多。此外,我们的研究表明,受试者间变异性在群体研究相对较低的敏感性和可靠性中起着重要作用。