David Geffen School of Medicine at UCLA, USA.
Neuroimage. 2014 Jan 1;84:1107-10. doi: 10.1016/j.neuroimage.2013.07.050. Epub 2013 Jul 25.
The recent Chu et al. (2012) manuscript discusses two key findings regarding feature selection (FS): (1) data driven FS was no better than using whole brain voxel data and (2) a priori biological knowledge was effective to guide FS. Use of FS is highly relevant in neuroimaging-based machine learning, as the number of attributes can greatly exceed the number of exemplars. We strongly endorse their demonstration of both of these findings, and we provide additional important practical and theoretical arguments as to why, in their case, the data-driven FS methods they implemented did not result in improved accuracy. Further, we emphasize that the data-driven FS methods they tested performed approximately as well as the all-voxel case. We discuss why a sparse model may be favored over a complex one with similar performance. We caution readers that the findings in the Chu et al. report should not be generalized to all data-driven FS methods.
最近 Chu 等人(2012)的手稿讨论了特征选择(FS)的两个关键发现:(1)数据驱动的 FS 并不比使用全脑体素数据更好,(2)先验的生物学知识有效地指导 FS。在基于神经影像学的机器学习中使用 FS 非常重要,因为属性的数量可能大大超过示例的数量。我们强烈支持他们对这两个发现的证明,并且我们提供了更多重要的实际和理论论据,说明为什么在他们的情况下,他们实施的数据驱动 FS 方法没有导致准确性提高。此外,我们强调他们测试的数据驱动 FS 方法的性能与全体素情况大致相同。我们讨论了为什么稀疏模型可能优于具有类似性能的复杂模型。我们提醒读者,Chu 等人报告中的发现不应推广到所有数据驱动的 FS 方法。