Alam Md Ashad, Fukumizu Kenji, Wang Yu-Ping
Department of Biomedical Engineering, Tulane University, New Orleans, LA 70118, USA.
The Institute of Statistical Mathematics, Tachikawa, Tokyo 190-8562, Japan.
Neurocomputing (Amst). 2018 Aug 23;304:12-29. doi: 10.1016/j.neucom.2018.04.008. Epub 2018 May 3.
Many unsupervised kernel methods rely on the estimation of kernel covariance operator (kernel CO) or kernel cross-covariance operator (kernel CCO). Both are sensitive to contaminated data, even when bounded positive definite kernels are used. To the best of our knowledge, there are few well-founded robust kernel methods for statistical unsupervised learning. In addition, while the influence function (IF) of an estimator can characterize its robustness, asymptotic properties and standard error, the IF of a standard kernel canonical correlation analysis (standard kernel CCA) has not been derived yet. To fill this gap, we first propose a robust kernel covariance operator (robust kernel CO) and a robust kernel cross-covariance operator (robust kernel CCO) based on a generalized loss function instead of the quadratic loss function. Second, we derive the IF for robust kernel CCO and standard kernel CCA. Using the IF of the standard kernel CCA, we can detect influential observations from two sets of data. Finally, we propose a method based on the robust kernel CO and the robust kernel CCO, called , which is less sensitive to noise than the standard kernel CCA. The introduced principles can also be applied to many other kernel methods involving kernel CO or kernel CCO. Our experiments on both synthesized and imaging genetics data demonstrate that the proposed IF of standard kernel CCA can identify outliers. It is also seen that the proposed robust kernel CCA method performs better for ideal and contaminated data than the standard kernel CCA.
许多无监督核方法依赖于核协方差算子(核CO)或核互协方差算子(核CCO)的估计。即使使用有界正定核,两者对受污染的数据都很敏感。据我们所知,用于统计无监督学习的有充分依据的鲁棒核方法很少。此外,虽然估计器的影响函数(IF)可以表征其鲁棒性、渐近性质和标准误差,但标准核典型相关分析(标准核CCA)的IF尚未推导出来。为了填补这一空白,我们首先基于广义损失函数而非二次损失函数提出了一种鲁棒核协方差算子(鲁棒核CO)和一种鲁棒核互协方差算子(鲁棒核CCO)。其次,我们推导了鲁棒核CCO和标准核CCA的IF。利用标准核CCA的IF,我们可以从两组数据中检测出有影响的观测值。最后,我们提出了一种基于鲁棒核CO和鲁棒核CCO的方法,称为 ,它比标准核CCA对噪声更不敏感。所引入的原理也可以应用于许多其他涉及核CO或核CCO的核方法。我们对合成数据和影像遗传学数据的实验表明,所提出的标准核CCA的IF可以识别异常值。还可以看出,所提出的鲁棒核CCA方法在处理理想数据和受污染数据时比标准核CCA表现更好。