Suppr超能文献

基于多域多任务迁移深度网络的预测用户点击特征的图像识别。

Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep Network.

出版信息

IEEE Trans Image Process. 2019 Dec;28(12):6047-6062. doi: 10.1109/TIP.2019.2921861. Epub 2019 Jun 28.

Abstract

The click feature of an image, defined as a user click count vector based on click data, has been demonstrated to be effective for reducing the semantic gap for image recognition. Unfortunately, most of the traditional image recognition datasets do not contain click data. To address this problem, researchers have begun to develop a click prediction model using assistant datasets containing click information and have adapted this predictor to a common click-free dataset for different tasks. This method can be customized to our problem, but it has two main limitations: 1) the predicted click feature often performs badly in the recognition task since the prediction model is constructed independently of the subsequent recognition problem and 2) transferring the predictor from one dataset to another is challenging due to the large cross-domain diversity. In this paper, we devise a multitask and multidomain deep network with varied modals (MTMDD-VM) to formulate image recognition and click prediction tasks in a unified framework. Datasets with and without click information are integrated in the training. Furthermore, a nonlinear word embedding with a position-sensitive loss function is designed to discover the visual click correlation. We evaluate the proposed method on three public dog breed image datasets, and we utilize the Clickture-Dog dataset as the auxiliary dataset that provides click data. The experimental results show that: 1) the nonlinear word embedding and position-sensitive loss function largely enhance the predicted click feature in the recognition task, realizing a 32% improvement in accuracy; 2) the multitask learning framework improves accuracies in both image recognition and click prediction; and 3) the unified training using the combined dataset with and without click data further improves the performance. Compared with the state-of-the-art methods, the proposed approach not only performs much better in accuracy but also achieves good scalability and one-shot learning ability.

摘要

图像的点击特征,定义为基于点击数据的用户点击计数向量,已被证明可以有效减少图像识别中的语义差距。不幸的是,大多数传统的图像识别数据集不包含点击数据。为了解决这个问题,研究人员开始使用包含点击信息的辅助数据集开发点击预测模型,并将该预测器适应于不同任务的常见无点击数据集。这种方法可以针对我们的问题进行定制,但它有两个主要限制:1)预测的点击特征在识别任务中表现不佳,因为预测模型是独立于后续识别问题构建的;2)由于跨域多样性大,将预测器从一个数据集转移到另一个数据集具有挑战性。在本文中,我们设计了一个具有多种模态的多任务和多域深度网络(MTMDD-VM),以统一的框架来制定图像识别和点击预测任务。有和没有点击信息的数据集都在训练中进行了整合。此外,我们设计了一个具有非线性词嵌入和位置敏感损失函数的非线性词嵌入,以发现视觉点击相关性。我们在三个公开的犬种图像数据集上评估了所提出的方法,并利用 Clickture-Dog 数据集作为提供点击数据的辅助数据集。实验结果表明:1)非线性词嵌入和位置敏感损失函数极大地增强了识别任务中的预测点击特征,准确率提高了 32%;2)多任务学习框架提高了图像识别和点击预测的准确率;3)使用带有和不带点击数据的组合数据集进行统一训练进一步提高了性能。与最先进的方法相比,所提出的方法不仅在准确性方面表现更好,而且具有良好的可扩展性和一次性学习能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验