IEEE Trans Pattern Anal Mach Intell. 2019 Feb;41(2):515-522. doi: 10.1109/TPAMI.2018.2794470. Epub 2018 Jan 17.
Discriminative methods commonly produce models with relatively good generalization abilities. However, this advantage is challenged in real-world applications (e.g., medical image analysis problems), in which there often exist outlier data points (sample-outliers) and noises in the predictor values (feature-noises). Methods robust to both types of these deviations are somewhat overlooked in the literature. We further argue that denoising can be more effective, if we learn the model using all the available labeled and unlabeled samples, as the intrinsic geometry of the sample manifold can be better constructed using more data points. In this paper, we propose a semi-supervised robust discriminative classification method based on the least-squares formulation of linear discriminant analysis to detect sample-outliers and feature-noises simultaneously, using both labeled training and unlabeled testing data. We conduct several experiments on a synthetic, some benchmark semi-supervised learning, and two brain neurodegenerative disease diagnosis datasets (for Parkinson's and Alzheimer's diseases). Specifically for the application of neurodegenerative diseases diagnosis, incorporating robust machine learning methods can be of great benefit, due to the noisy nature of neuroimaging data. Our results show that our method outperforms the baseline and several state-of-the-art methods, in terms of both accuracy and the area under the ROC curve.
判别方法通常会产生具有较好泛化能力的模型。然而,这种优势在实际应用中(例如医学图像分析问题)受到了挑战,因为在预测值中经常存在异常数据点(样本异常值)和噪声(特征噪声)。在文献中,对这两种偏差都具有鲁棒性的方法有些被忽视了。我们进一步认为,如果我们使用所有可用的有标签和无标签样本学习模型,那么去噪会更有效,因为使用更多的数据点可以更好地构建样本流形的内在几何结构。在本文中,我们提出了一种基于线性判别分析最小二乘公式的半监督鲁棒判别分类方法,以同时检测样本异常值和特征噪声,同时使用有标签的训练和无标签的测试数据。我们在一个合成数据集、一些基准半监督学习数据集和两个脑神经退行性疾病诊断数据集(帕金森病和阿尔茨海默病)上进行了几项实验。特别是对于神经退行性疾病诊断的应用,由于神经影像学数据的噪声特性,因此纳入鲁棒的机器学习方法会非常有益。我们的结果表明,在准确性和 ROC 曲线下面积方面,我们的方法优于基线和几种最先进的方法。