Liu Feng, Zhang Guangquan, Lu Jie
IEEE Trans Neural Netw Learn Syst. 2020 Dec;31(12):5588-5602. doi: 10.1109/TNNLS.2020.2973293. Epub 2020 Nov 30.
Domain adaptation leverages the knowledge in one domain-the source domain-to improve learning efficiency in another domain-the target domain. Existing heterogeneous domain adaptation research is relatively well-progressed but only in situations where the target domain contains at least a few labeled instances. In contrast, heterogeneous domain adaptation with an unlabeled target domain has not been well-studied. To contribute to the research in this emerging field, this article presents: 1) an unsupervised knowledge transfer theorem that guarantees the correctness of transferring knowledge and 2) a principal angle-based metric to measure the distance between two pairs of domains: one pair comprises the original source and target domains and the other pair comprises two homogeneous representations of two domains. The theorem and the metric have been implemented in an innovative transfer model, called a Grassmann-linear monotonic maps-geodesic flow kernel (GLG), which is specifically designed for heterogeneous unsupervised domain adaptation (HeUDA). The linear monotonic maps (LMMs) meet the conditions of the theorem and are used to construct homogeneous representations of the heterogeneous domains. The metric shows the extent to which the homogeneous representations have preserved the information in the original source and target domains. By minimizing the proposed metric, the GLG model learns the homogeneous representations of heterogeneous domains and transfers knowledge through these learned representations via a geodesic flow kernel (GFK). To evaluate the model, five public data sets were reorganized into ten HeUDA tasks across three applications: cancer detection, the credit assessment, and text classification. The experiments demonstrate that the proposed model delivers superior performance over the existing baselines.
域适应利用一个域(源域)中的知识来提高另一个域(目标域)中的学习效率。现有的异构域适应研究进展相对较好,但仅适用于目标域至少包含一些标记实例的情况。相比之下,具有未标记目标域的异构域适应尚未得到充分研究。为了推动这一新兴领域的研究,本文提出了:1)一个无监督知识转移定理,该定理保证了知识转移的正确性;2)一种基于主角度的度量,用于测量两对域之间的距离:一对包括原始源域和目标域,另一对包括两个域的两个同构表示。该定理和度量已在一个创新的转移模型中实现,该模型称为格拉斯曼线性单调映射-测地线流核(GLG),它是专门为异构无监督域适应(HeUDA)设计的。线性单调映射(LMM)满足定理条件,并用于构建异构域的同构表示。该度量显示了同构表示在多大程度上保留了原始源域和目标域中的信息。通过最小化所提出的度量,GLG模型学习异构域的同构表示,并通过测地线流核(GFK)通过这些学习到的表示转移知识。为了评估该模型,五个公共数据集被重新组织成跨三个应用的十个HeUDA任务:癌症检测、信用评估和文本分类。实验表明,所提出的模型比现有的基线模型具有更好的性能。