Suppr超能文献

基于三维模型的对象识别的坐标度量学习生成模型。

Generative Model With Coordinate Metric Learning for Object Recognition Based on 3D Models.

出版信息

IEEE Trans Image Process. 2018 Dec;27(12):5813-5826. doi: 10.1109/TIP.2018.2858553. Epub 2018 Jul 23.

Abstract

One of the bottlenecks in acquiring a perfect database for deep learning is the tedious process of collecting and labeling data. In this paper, we propose a generative model trained with synthetic images rendered from 3D models which can reduce the burden on collecting real training data and make the background conditions more realistic. Our architecture is composed of two sub-networks: a semantic foreground object reconstruction network based on Bayesian inference and a classification network based on multi-triplet cost training for avoiding overfitting on the monotone synthetic object surface and utilizing accurate information of synthetic images like object poses and lighting conditions which are helpful for recognizing regular photos. First, our generative model with metric learning utilizes additional foreground object channels generated from semantic foreground object reconstruction sub-network for recognizing the original input images. Multi-triplet cost function based on poses is used for metric learning which makes it possible to train an effective categorical classifier purely based on synthetic data. Second, we design a coordinate training strategy with the help of adaptive noise applied on the inputs of both of the concatenated sub-networks to make them benefit from each other and avoid inharmonious parameter tuning due to different convergence speeds of two sub-networks. Our architecture achieves the state-of-the-art accuracy of 50.5% on the ShapeNet database with data migration obstacle from synthetic images to real images. This pipeline makes it applicable to do recognition on real images only based on 3D models. Our codes are available at https://github.com/wangyida/gm-cml.

摘要

获取用于深度学习的完美数据库的一个瓶颈是收集和标记数据的繁琐过程。在本文中,我们提出了一种使用从 3D 模型渲染的合成图像训练的生成模型,它可以减轻收集真实训练数据的负担,并使背景条件更加真实。我们的架构由两个子网络组成:基于贝叶斯推理的语义前景对象重建网络和基于多三元组成本训练的分类网络,以避免对单调合成对象表面的过拟合,并利用合成图像的准确信息,如对象姿势和光照条件,这有助于识别常规照片。首先,我们的生成模型利用度量学习,利用语义前景对象重建子网络生成的附加前景对象通道来识别原始输入图像。基于姿势的多三元组成本函数用于度量学习,这使得仅基于合成数据训练有效的分类器成为可能。其次,我们设计了一种坐标训练策略,在两个连接的子网络的输入上应用自适应噪声,使它们相互受益,并避免由于两个子网络的收敛速度不同而导致的不协调参数调整。我们的架构在 ShapeNet 数据库上实现了 50.5%的最新精度,同时克服了从合成图像到真实图像的数据迁移障碍。该流水线使得仅基于 3D 模型即可在真实图像上进行识别成为可能。我们的代码可在 https://github.com/wangyida/gm-cml 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验