Suppr超能文献

基于判别式异构最大均值差异的域适应跨物种数据分类

Cross-species Data Classification by Domain Adaptation via Discriminative Heterogeneous Maximum Mean Discrepancy.

作者信息

Li Limin, Cai Menglan

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):312-324. doi: 10.1109/TCBB.2019.2914103. Epub 2021 Feb 3.

Abstract

Cross-species or Cross-platform data classification is a challenging problem in the field of bioinformatics, which aims to classify data samples in one species/platform by using labeled data samples in another species/platform. Traditional classification methods can not be used in this case, since the samples from two species/platforms may have different feature spaces, or follow different statistical distributions. Domain adaptation is a new strategy which could be used to deal with this problem. A big challenge in domain adaptation is how to reduce the difference and correct the drift between the source and the target domains in the heterogeneous case, when the feature spaces of the two domains are different. It has been shown theoretically that probability divergences between the two domains such as maximum mean discrepancy (MMD) play an important role in the generalization bound for domain adaptation. However, they are rarely used for heterogeneous domain adaptation due to the different feature spaces of the domains. In this work, we propose a heterogeneous domain adaptation approach by making use of MMD, which measures the probability divergence in an embedded low-dimensional common subspace. Our proposed discriminative heterogeneous MMD approach (DMMD) aims to find new representations of the samples in a common subspace by minimizing the domain probability divergence with preserving the known discriminative information. A conjugate gradient algorithm on a Grassmann manifold is applied to solve the nonlinear DMMD model. Our experiments on both simulation and benchmark machine learning datasets show that our approaches outperform other state-of-the-art approaches for heterogeneous domain adaptation. We finally apply our approach to a cross-platform dataset and a cross-species dataset, and the results show the effectiveness of our approach.

摘要

跨物种或跨平台数据分类是生物信息学领域中一个具有挑战性的问题,其目的是通过使用另一个物种/平台中的标记数据样本对一个物种/平台中的数据样本进行分类。在这种情况下不能使用传统的分类方法,因为来自两个物种/平台的样本可能具有不同的特征空间,或者遵循不同的统计分布。域适应是一种可用于处理此问题的新策略。域适应中的一个重大挑战是,在异构情况下,当两个域的特征空间不同时,如何减少源域和目标域之间的差异并纠正漂移。从理论上已经表明,两个域之间的概率散度,如最大均值差异(MMD),在域适应的泛化界中起着重要作用。然而,由于域的特征空间不同,它们很少用于异构域适应。在这项工作中,我们提出了一种利用MMD的异构域适应方法,该方法在嵌入的低维公共子空间中测量概率散度。我们提出的判别式异构MMD方法(DMMD)旨在通过最小化域概率散度并保留已知的判别信息,在公共子空间中找到样本的新表示。应用格拉斯曼流形上的共轭梯度算法来求解非线性DMMD模型。我们在模拟和基准机器学习数据集上的实验表明,我们的方法在异构域适应方面优于其他现有方法。我们最终将我们的方法应用于一个跨平台数据集和一个跨物种数据集,结果表明了我们方法的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验