启发式注意力表示学习的自监督预训练。

Heuristic Attention Representation Learning for Self-Supervised Pretraining.

机构信息

Department of Computer Science and Information Engineering, National Central University, Taoyuan 3200, Taiwan.

AI Research Center, Hon Hai Research Institute, Taipei 114699, Taiwan.

出版信息

Sensors (Basel). 2022 Jul 10;22(14):5169. doi: 10.3390/s22145169.

DOI:10.3390/s22145169

PMID:35890847

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9320898/

Abstract

Recently, self-supervised learning methods have been shown to be very powerful and efficient for yielding robust representation learning by maximizing the similarity across different augmented views in embedding vector space. However, the main challenge is generating different views with random cropping; the semantic feature might exist differently across different views leading to inappropriately maximizing similarity objective. We tackle this problem by introducing euristic ttention epresentation earning (HARL). This self-supervised framework relies on the joint embedding architecture in which the two neural networks are trained to produce similar embedding for different augmented views of the same image. HARL framework adopts prior visual object-level attention by generating a heuristic mask proposal for each training image and maximizes the abstract object-level embedding on vector space instead of whole image representation from previous works. As a result, HARL extracts the quality semantic representation from each training sample and outperforms self-supervised baselines on several downstream tasks. In addition, we provide efficient techniques based on conventional computer vision and deep learning methods for generating heuristic mask proposals on natural image datasets. Our HARL achieves +1.3% advancement in the ImageNet semi-supervised learning benchmark and +0.9% improvement in AP of the COCO object detection task over the previous state-of-the-art method BYOL. Our code implementation is available for both TensorFlow and PyTorch frameworks.

摘要

最近，自监督学习方法通过最大化嵌入向量空间中不同增强视图之间的相似性，被证明在产生鲁棒的表示学习方面非常强大和高效。然而，主要的挑战是通过随机裁剪生成不同的视图；语义特征可能在不同的视图中存在差异，导致相似性目标的不适当最大化。我们通过引入启发式注意力表示学习（HARL）来解决这个问题。这个自监督框架依赖于联合嵌入架构，其中两个神经网络被训练为对同一图像的不同增强视图产生相似的嵌入。HARL 框架通过为每个训练图像生成启发式掩模提案来采用先前的视觉对象级注意力，并在向量空间上最大化抽象对象级别的嵌入，而不是像以前的工作那样从整个图像表示中最大化。结果，HARL 从每个训练样本中提取出高质量的语义表示，并在几个下游任务上优于自监督基线。此外，我们还提供了基于传统计算机视觉和深度学习方法的高效技术，用于在自然图像数据集上生成启发式掩模提案。我们的 HARL 在 ImageNet 半监督学习基准上实现了+1.3%的提升，在 COCO 目标检测任务的 AP 上比以前的最先进方法 BYOL 提高了+0.9%。我们的代码实现同时适用于 TensorFlow 和 PyTorch 框架。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

启发式注意力表示学习的自监督预训练。

Heuristic Attention Representation Learning for Self-Supervised Pretraining.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

启发式注意力表示学习的自监督预训练。

Heuristic Attention Representation Learning for Self-Supervised Pretraining.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献