Suppr超能文献

特征选择是否能提高分类准确性?使用解剖磁共振图像进行分类时,样本量和特征选择的影响。

Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images.

机构信息

Section on Functional Imaging Methods, Laboratory of Brain and Cognition, NIMH, NIH, Bethesda, USA.

出版信息

Neuroimage. 2012 Mar;60(1):59-70. doi: 10.1016/j.neuroimage.2011.11.066. Epub 2011 Dec 1.

Abstract

There are growing numbers of studies using machine learning approaches to characterize patterns of anatomical difference discernible from neuroimaging data. The high-dimensionality of image data often raises a concern that feature selection is needed to obtain optimal accuracy. Among previous studies, mostly using fixed sample sizes, some show greater predictive accuracies with feature selection, whereas others do not. In this study, we compared four common feature selection methods. 1) Pre-selected region of interests (ROIs) that are based on prior knowledge. 2) Univariate t-test filtering. 3) Recursive feature elimination (RFE), and 4) t-test filtering constrained by ROIs. The predictive accuracies achieved from different sample sizes, with and without feature selection, were compared statistically. To demonstrate the effect, we used grey matter segmented from the T1-weighted anatomical scans collected by the Alzheimer's disease Neuroimaging Initiative (ADNI) as the input features to a linear support vector machine classifier. The objective was to characterize the patterns of difference between Alzheimer's disease (AD) patients and cognitively normal subjects, and also to characterize the difference between mild cognitive impairment (MCI) patients and normal subjects. In addition, we also compared the classification accuracies between MCI patients who converted to AD and MCI patients who did not convert within the period of 12 months. Predictive accuracies from two data-driven feature selection methods (t-test filtering and RFE) were no better than those achieved using whole brain data. We showed that we could achieve the most accurate characterizations by using prior knowledge of where to expect neurodegeneration (hippocampus and parahippocampal gyrus). Therefore, feature selection does improve the classification accuracies, but it depends on the method adopted. In general, larger sample sizes yielded higher accuracies with less advantage obtained by using knowledge from the existing literature.

摘要

越来越多的研究采用机器学习方法来描述从神经影像学数据中可识别的解剖差异模式。图像数据的高维性常常引起人们的关注,即需要进行特征选择以获得最佳的准确性。在之前的研究中,大多数使用固定的样本大小,有些研究表明特征选择具有更高的预测准确性,而有些则不然。在这项研究中,我们比较了四种常见的特征选择方法。1)基于先验知识的预先选择的感兴趣区域(ROI)。2)单变量 t 检验过滤。3)递归特征消除(RFE),以及 4)受 ROI 限制的 t 检验过滤。统计比较了不同样本大小、有无特征选择时的预测准确性。为了演示效果,我们使用从阿尔茨海默病神经影像学倡议(ADNI)收集的 T1 加权解剖扫描中分割的灰质作为线性支持向量机分类器的输入特征。目标是描述阿尔茨海默病(AD)患者与认知正常受试者之间差异的模式,以及描述轻度认知障碍(MCI)患者与正常受试者之间的差异。此外,我们还比较了在 12 个月内转化为 AD 的 MCI 患者和未转化为 AD 的 MCI 患者之间的分类准确性。两种数据驱动的特征选择方法(t 检验过滤和 RFE)的预测准确性并不优于使用全脑数据获得的准确性。我们表明,通过使用对神经退行性病变发生位置的先验知识(海马体和海马旁回),我们可以实现最准确的特征描述。因此,特征选择确实可以提高分类准确性,但这取决于所采用的方法。总的来说,更大的样本量可以获得更高的准确性,而利用现有文献中的知识则获得的优势较小。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验