Wang Haibo, Singanamalli Asha, Ginsburg Shoshana, Madabhushi Anant
Med Image Comput Comput Assist Interv. 2014;17(Pt 3):385-92. doi: 10.1007/978-3-319-10443-0_49.
This paper presents Group-sparse Nonnegative supervised Canonical Correlation Analysis (GNCCA), a novel methodology for identifying discriminative features from multiple feature views. Existing correlation-based methods do not guarantee positive correlations of the selected features and often need a pre-feature selection step to reduce redundant features on each feature view. The new GNCCA approach attempts to overcome these issues by incorporating (1) a nonnegativity constraint that guarantees positive correlations in the reduced representation and (2) a group-sparsity constraint that allows for simultaneous between- and within- view feature selection. In particular, GNCCA is designed to emphasize correlations between feature views and class labels such that the selected features guarantee better class separability. In this work, GNCCA was evaluated on three prostate cancer (CaP) prognosis tasks: (i) identifying 40 CaP patients with and without 5-year biochemical recurrence following radical prostatectomy by fusing quantitative features extracted from digitized pathology and proteomics, (ii) predicting in vivo prostate cancer grade for 16 CaP patients by fusing T2w and DCE MRI, and (iii) localizing CaP/benign regions on MR spectroscopy and MRI for 36 patients. For the three tasks, GNCCA identifies a feature subset comprising 2%, 1% and 22%, respectively, of the original extracted features. These selected features achieve improved or comparable results compared to using all features with the same Support Vector Machine (SVM) classifier. In addition, GNCCA consistently outperforms 5 state-of-the-art feature selection methods across all three datasets.
本文提出了组稀疏非负监督典型相关分析(GNCCA),这是一种从多个特征视图中识别判别性特征的新方法。现有的基于相关性的方法不能保证所选特征的正相关性,并且通常需要一个预特征选择步骤来减少每个特征视图上的冗余特征。新的GNCCA方法试图通过纳入(1)一个非负性约束来保证降维表示中的正相关性,以及(2)一个组稀疏性约束来实现视图间和视图内特征的同时选择,从而克服这些问题。特别地,GNCCA旨在强调特征视图与类别标签之间的相关性,使得所选特征保证更好的类别可分性。在这项工作中,对GNCCA在三个前列腺癌(CaP)预后任务上进行了评估:(i)通过融合从数字化病理学和蛋白质组学中提取的定量特征,识别40例前列腺癌根治术后有无5年生化复发的患者;(ii)通过融合T2w和DCE MRI预测16例前列腺癌患者的体内前列腺癌分级;(iii)对36例患者在磁共振波谱和MRI上定位前列腺癌/良性区域。对于这三个任务,GNCCA分别识别出一个特征子集,其包含原始提取特征的2%、1%和22%。与使用相同支持向量机(SVM)分类器的所有特征相比,这些所选特征取得了改进或相当的结果。此外,在所有三个数据集上,GNCCA始终优于5种先进的特征选择方法。