Gaonkar Bilwaj, T Shinohara Russell, Davatzikos Christos
Center for Biomedical Image Computing and Analytics, United States.
Center for Biomedical Image Computing and Analytics, United States; Department of Biostatistics and Epidemiology Perelman School of Medicine, University of Pennsylvania, United States.
Med Image Anal. 2015 Aug;24(1):190-204. doi: 10.1016/j.media.2015.06.008. Epub 2015 Jun 25.
Machine learning based classification algorithms like support vector machines (SVMs) have shown great promise for turning a high dimensional neuroimaging data into clinically useful decision criteria. However, tracing imaging based patterns that contribute significantly to classifier decisions remains an open problem. This is an issue of critical importance in imaging studies seeking to determine which anatomical or physiological imaging features contribute to the classifier's decision, thereby allowing users to critically evaluate the findings of such machine learning methods and to understand disease mechanisms. The majority of published work addresses the question of statistical inference for support vector classification using permutation tests based on SVM weight vectors. Such permutation testing ignores the SVM margin, which is critical in SVM theory. In this work we emphasize the use of a statistic that explicitly accounts for the SVM margin and show that the null distributions associated with this statistic are asymptotically normal. Further, our experiments show that this statistic is a lot less conservative as compared to weight based permutation tests and yet specific enough to tease out multivariate patterns in the data. Thus, we can better understand the multivariate patterns that the SVM uses for neuroimaging based classification.
基于机器学习的分类算法,如支持向量机(SVM),在将高维神经影像数据转化为临床有用的决策标准方面显示出了巨大的潜力。然而,追踪对分类器决策有显著贡献的影像模式仍然是一个未解决的问题。在试图确定哪些解剖学或生理学影像特征对分类器决策有贡献的影像研究中,这是一个至关重要的问题,从而使用户能够严格评估此类机器学习方法的结果并理解疾病机制。大多数已发表的工作都涉及使用基于支持向量机权重向量的排列检验来进行支持向量分类的统计推断问题。这种排列检验忽略了支持向量机的间隔,而间隔在支持向量机理论中至关重要。在这项工作中,我们强调使用一种明确考虑支持向量机间隔的统计量,并表明与该统计量相关的零分布是渐近正态的。此外,我们的实验表明,与基于权重的排列检验相比,该统计量的保守性要低得多,同时又足够具体以梳理出数据中的多变量模式。因此,我们可以更好地理解支持向量机用于基于神经影像的分类的多变量模式。