基于类别的深度典型相关分析用于从多模态数据中进行细粒度场所发现

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data.

作者信息

Yu Yi, Tang Suhua, Aizawa Kiyoharu, Aizawa Akiko

出版信息

IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):1250-1258. doi: 10.1109/TNNLS.2018.2856253. Epub 2018 Aug 10.

DOI:10.1109/TNNLS.2018.2856253

Abstract

In this work, travel destinations and business locations are taken as venues. Discovering a venue by a photograph is very important for visual context-aware applications. Unfortunately, few efforts paid attention to complicated real images such as venue photographs generated by users. Our goal is fine-grained venue discovery from heterogeneous social multimodal data. To this end, we propose a novel deep learning model, category-based deep canonical correlation analysis. Given a photograph as input, this model performs: 1) exact venue search (find the venue where the photograph was taken) and 2) group venue search (find relevant venues that have the same category as the photograph), by the cross-modal correlation between the input photograph and textual description of venues. In this model, data in different modalities are projected to a same space via deep networks. Pairwise correlation (between different modality data from the same venue) for exact venue search and category-based correlation (between different modality data from different venues with the same category) for group venue search are jointly optimized. Because a photograph cannot fully reflect rich text description of a venue, the number of photographs per venue in the training phase is increased to capture more aspects of a venue. We build a new venue-aware multimodal data set by integrating Wikipedia featured articles and Foursquare venue photographs. Experimental results on this data set confirm the feasibility of the proposed method. Moreover, the evaluation over another publicly available data set confirms that the proposed method outperforms state of the arts for cross-modal retrieval between image and text.

摘要

在这项工作中，旅游目的地和商业地点被视为场所。通过照片发现场所对于视觉上下文感知应用非常重要。不幸的是，很少有人关注复杂的真实图像，例如用户生成的场所照片。我们的目标是从异构社会多模态数据中进行细粒度的场所发现。为此，我们提出了一种新颖的深度学习模型，即基于类别的深度典型相关分析。给定一张照片作为输入，该模型通过输入照片与场所文本描述之间的跨模态相关性来执行：1）精确场所搜索（找到拍摄照片的场所）和2）群组场所搜索（找到与照片具有相同类别的相关场所）。在这个模型中，不同模态的数据通过深度网络投影到同一个空间。联合优化用于精确场所搜索的成对相关性（来自同一场所的不同模态数据之间）和用于群组场所搜索的基于类别的相关性（来自具有相同类别的不同场所的不同模态数据之间）。由于照片不能完全反映场所丰富的文本描述，因此在训练阶段增加每个场所的照片数量以捕捉场所的更多方面。我们通过整合维基百科特色文章和四方场所照片构建了一个新的场所感知多模态数据集。在这个数据集上的实验结果证实了所提方法的可行性。此外，在另一个公开可用数据集上的评估证实，所提方法在图像与文本之间的跨模态检索方面优于现有技术。

相似文献

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data.基于类别的深度典型相关分析用于从多模态数据中进行细粒度场所发现

IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):1250-1258. doi: 10.1109/TNNLS.2018.2856253. Epub 2018 Aug 10.

Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network.基于循环注意力网络的模态特定跨模态相似性度量

IEEE Trans Image Process. 2018 Jul 2. doi: 10.1109/TIP.2018.2852503.

Online Data Organizer: Micro-Video Categorization by Structure-Guided Multimodal Dictionary Learning.在线数据整理器：基于结构引导的多模态字典学习的微视频分类。

IEEE Trans Image Process. 2019 Mar;28(3):1235-1247. doi: 10.1109/TIP.2018.2875363. Epub 2018 Oct 10.

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval.SMAN：用于跨模态图像-文本检索的堆叠多模态注意力网络。

IEEE Trans Cybern. 2022 Feb;52(2):1086-1097. doi: 10.1109/TCYB.2020.2985716. Epub 2022 Feb 16.

Con-Text: Text Detection for Fine-Grained Object Classification.语境：细粒度目标分类的文本检测。

IEEE Trans Image Process. 2017 Aug;26(8):3965-3980. doi: 10.1109/TIP.2017.2707805. Epub 2017 May 24.

Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning.基于深度表示学习的混合DAER跨模态检索

Entropy (Basel). 2023 Aug 16;25(8):1216. doi: 10.3390/e25081216.

Predicting the temporal activity patterns of new venues.预测新场所的时间活动模式。

EPJ Data Sci. 2018;7(1):13. doi: 10.1140/epjds/s13688-018-0142-z. Epub 2018 May 18.

Detecting and Grounding Multi-Modal Media Manipulation and Beyond.检测与定位多模态媒体操纵及其他相关内容。

IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5556-5574. doi: 10.1109/TPAMI.2024.3367749. Epub 2024 Jul 2.

Measuring and Predicting Tag Importance for Image Retrieval.衡量和预测图像检索中的标签重要性。

IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2423-2436. doi: 10.1109/TPAMI.2017.2651818. Epub 2017 Jan 11.

Cascade category-aware visual search.级联类别感知视觉搜索。

IEEE Trans Image Process. 2014 Jun;23(6):2514-27. doi: 10.1109/TIP.2014.2317986. Epub 2014 Apr 17.

引用本文的文献

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion.基于凝聚注意力融合的对比学习的多模态情感分析表示学习。

Sensors (Basel). 2023 Mar 1;23(5):2679. doi: 10.3390/s23052679.

An SUI-based approach to explore visual search results cluster-graphs.基于 SUI 的方法探索视觉搜索结果聚类图。

PLoS One. 2023 Jan 20;18(1):e0280400. doi: 10.1371/journal.pone.0280400. eCollection 2023.

The Novel Sensor Network Structure for Classification Processing Based on the Machine Learning Method of the ACGAN.基于ACGAN机器学习方法的用于分类处理的新型传感器网络结构

Sensors (Basel). 2019 Jul 17;19(14):3145. doi: 10.3390/s19143145.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于类别的深度典型相关分析用于从多模态数据中进行细粒度场所发现

Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献