Suppr超能文献

面向曲线下面积的域适应:从理论到算法

AUC-Oriented Domain Adaptation: From Theory to Algorithm.

作者信息

Yang Zhiyong, Xu Qianqian, Bao Shilong, Wen Peisong, He Yuan, Cao Xiaochun, Huang Qingming

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14161-14174. doi: 10.1109/TPAMI.2023.3303943. Epub 2023 Nov 3.

Abstract

The Area Under the ROC curve (AUC) is a crucial metric for machine learning, which is often a reasonable choice for applications like disease prediction and fraud detection where the datasets often exhibit a long-tail nature. However, most of the existing AUC-oriented learning methods assume that the training data and test data are drawn from the same distribution. How to deal with domain shift remains widely open. This paper presents an early trial to attack AUC-oriented Unsupervised Domain Adaptation (UDA) (denoted as AUCUDA hence after). Specifically, we first construct a generalization bound that exploits a new distributional discrepancy for AUC. The critical challenge is that the AUC risk could not be expressed as a sum of independent loss terms, making the standard theoretical technique unavailable. We propose a new result that not only addresses the interdependency issue but also brings a much sharper bound with weaker assumptions about the loss function. Turning theory into practice, the original discrepancy requires complete annotations on the target domain, which is incompatible with UDA. To fix this issue, we propose a pseudo-labeling strategy and present an end-to-end training framework. Finally, empirical studies over five real-world datasets speak to the efficacy of our framework.

摘要

ROC曲线下面积(AUC)是机器学习中的一个关键指标,对于疾病预测和欺诈检测等应用来说,它通常是一个合理的选择,因为这些应用中的数据集往往具有长尾性质。然而,大多数现有的面向AUC的学习方法都假设训练数据和测试数据来自相同的分布。如何处理域转移仍然是一个悬而未决的问题。本文提出了一项早期尝试,旨在攻击面向AUC的无监督域适应(UDA)(此后简称为AUCUDA)。具体来说,我们首先构建了一个泛化界,该界利用了一种新的AUC分布差异。关键挑战在于,AUC风险不能表示为独立损失项的总和,这使得标准理论技术无法使用。我们提出了一个新的结果,该结果不仅解决了相互依赖问题,而且在对损失函数的假设较弱的情况下给出了一个更精确的界。将理论应用于实践时,原始差异需要对目标域进行完整标注,这与UDA不兼容。为了解决这个问题,我们提出了一种伪标签策略,并给出了一个端到端的训练框架。最后,对五个真实世界数据集的实证研究证明了我们框架的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验