Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina at Chapel Hill, NC 27599, USA.
Department of Brain and Cognitive Engineering, Korea University, Seoul 02841, Republic of Korea.
Sci Rep. 2017 Mar 30;7:45269. doi: 10.1038/srep45269.
Classification is one of the most important tasks in machine learning. Due to feature redundancy or outliers in samples, using all available data for training a classifier may be suboptimal. For example, the Alzheimer's disease (AD) is correlated with certain brain regions or single nucleotide polymorphisms (SNPs), and identification of relevant features is critical for computer-aided diagnosis. Many existing methods first select features from structural magnetic resonance imaging (MRI) or SNPs and then use those features to build the classifier. However, with the presence of many redundant features, the most discriminative features are difficult to be identified in a single step. Thus, we formulate a hierarchical feature and sample selection framework to gradually select informative features and discard ambiguous samples in multiple steps for improved classifier learning. To positively guide the data manifold preservation process, we utilize both labeled and unlabeled data during training, making our method semi-supervised. For validation, we conduct experiments on AD diagnosis by selecting mutually informative features from both MRI and SNP, and using the most discriminative samples for training. The superior classification results demonstrate the effectiveness of our approach, as compared with the rivals.
分类是机器学习中最重要的任务之一。由于样本中的特征冗余或异常值,使用所有可用数据进行分类器训练可能不是最优的。例如,阿尔茨海默病(AD)与某些大脑区域或单核苷酸多态性(SNP)相关,识别相关特征对于计算机辅助诊断至关重要。许多现有的方法首先从结构磁共振成像(MRI)或 SNP 中选择特征,然后使用这些特征构建分类器。然而,由于存在许多冗余特征,在单个步骤中很难识别出最具判别力的特征。因此,我们提出了一个分层特征和样本选择框架,以在多个步骤中逐步选择信息丰富的特征并丢弃模糊样本,从而改善分类器的学习。为了积极引导数据流形保持过程,我们在训练过程中同时使用有标签和无标签的数据,使我们的方法成为半监督的。在验证阶段,我们通过从 MRI 和 SNP 中选择相互信息丰富的特征,并使用最具判别力的样本进行训练,来进行 AD 诊断实验。与竞争对手相比,优越的分类结果证明了我们方法的有效性。