IEEE Trans Cybern. 2020 Feb;50(2):739-752. doi: 10.1109/TCYB.2018.2872800. Epub 2018 Oct 15.
The performance of a classifier might greatly deteriorate due to missing data. Many different techniques to handle this problem have been developed. In this paper, we solve the problem of missing data using a novel transfer learning perspective and show that when an additive least squares support vector machine (LS-SVM) is adopted, model transfer learning can be used to enhance the classification performance on incomplete training datasets. A novel transfer-based additive LS-SVM classifier is accordingly proposed. This method also simultaneously determines the influence of classification errors caused by each incomplete sample using a fast leave-one-out cross validation strategy, as an alternative way to clean the training data to further improve the data quality. The proposed method has been applied to seven public datasets. The experimental results indicate that the proposed method achieves at least comparable, if not better, performance than case deletion, mean imputation, and k -nearest neighbor imputation methods, followed by the standard LS-SVM and support vector machine classifiers. Moreover, a case study on a community healthcare dataset using the proposed method is presented in detail, which particularly highlights the contributions and benefits of the proposed method to this real-world application.
由于数据缺失,分类器的性能可能会大大下降。已经开发了许多不同的技术来处理这个问题。在本文中,我们使用一种新颖的迁移学习视角来解决缺失数据的问题,并表明当采用加法最小二乘支持向量机 (LS-SVM) 时,可以使用模型迁移学习来增强对不完整训练数据集的分类性能。因此,提出了一种新颖的基于迁移的加法 LS-SVM 分类器。该方法还使用快速留一交叉验证策略同时确定每个不完整样本引起的分类错误的影响,作为清理训练数据的另一种方法,以进一步提高数据质量。该方法已应用于七个公共数据集。实验结果表明,该方法至少与案例删除、均值插补和 K-最近邻插补方法的性能相当,如果不比它们更好,其次是标准 LS-SVM 和支持向量机分类器。此外,还详细介绍了使用该方法对社区医疗保健数据集的案例研究,特别强调了该方法对这一实际应用的贡献和好处。