Parietal Project-team, INRIA Saclay-île de France, France; CEA/Neurospin bât 145, 91191 Gif-Sur-Yvette, France; Université Paris-Saclay, Saclay, France.
Neuroimage. 2018 Oct 15;180(Pt A):68-77. doi: 10.1016/j.neuroimage.2017.06.061. Epub 2017 Jun 24.
Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg±10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.
预测模型是统计大脑图像分析领域诸多最新进展的基础,包括解码、MVPA、搜索光和生物标志物提取。验证其有效性和实用性的原则方法是交叉验证,即通过未知数据进行预测测试。在这里,我想提醒大家注意交叉验证的误差幅度,它往往被低估了。简单的实验表明,许多神经影像学研究的样本量本身就会导致较大的误差幅度,例如 100 个样本的误差幅度为±10%。折叠间的标准误差大大低估了这些误差幅度。这些较大的误差幅度会影响使用预测模型得出的结论的可靠性,例如生物标志物或方法开发,与认知神经影像学 MVPA 方法不同,不能通过在许多受试者中重复实验来获得更多的样本。必须研究增加样本量的解决方案,以解决数据异质性可能增加的问题。