Suppr超能文献

多尺度多视角深度特征聚合的食物识别方法。

Multi-Scale Multi-View Deep Feature Aggregation for Food Recognition.

出版信息

IEEE Trans Image Process. 2020;29:265-276. doi: 10.1109/TIP.2019.2929447. Epub 2019 Jul 29.

Abstract

Recently, food recognition has received more and more attention in image processing and computer vision for its great potential applications in human health. Most of the existing methods directly extracted deep visual features via convolutional neural networks (CNNs) for food recognition. Such methods ignore the characteristics of food images and are, thus, hard to achieve optimal recognition performance. In contrast to general object recognition, food images typically do not exhibit distinctive spatial arrangement and common semantic patterns. In this paper, we propose a multi-scale multi-view feature aggregation (MSMVFA) scheme for food recognition. MSMVFA can aggregate high-level semantic features, mid-level attribute features, and deep visual features into a unified representation. These three types of features describe the food image from different granularity. Therefore, the aggregated features can capture the semantics of food images with the greatest probability. For that solution, we utilize additional ingredient knowledge to obtain mid-level attribute representation via ingredient-supervised CNNs. High-level semantic features and deep visual features are extracted from class-supervised CNNs. Considering food images do not exhibit distinctive spatial layout in many cases, MSMVFA fuses multi-scale CNN activations for each type of features to make aggregated features more discriminative and invariable to geometrical deformation. Finally, the aggregated features are more robust, comprehensive, and discriminative via two-level fusion, namely multi-scale fusion for each type of features and multi-view aggregation for different types of features. In addition, MSMVFA is general and different deep networks can be easily applied into this scheme. Extensive experiments and evaluations demonstrate that our method achieves state-of-the-art recognition performance on three popular large-scale food benchmark datasets in Top-1 recognition accuracy. Furthermore, we expect this paper will further the agenda of food recognition in the community of image processing and computer vision.

摘要

最近,食品识别在图像处理和计算机视觉领域受到越来越多的关注,因为它在人类健康方面具有巨大的潜在应用。现有的大多数方法直接通过卷积神经网络 (CNN) 提取深度视觉特征来进行食品识别。这些方法忽略了食品图像的特点,因此很难达到最佳的识别性能。与一般的目标识别相比,食品图像通常没有独特的空间排列和常见的语义模式。在本文中,我们提出了一种用于食品识别的多尺度多视图特征聚合 (MSMVFA) 方案。MSMVFA 可以将高层语义特征、中层属性特征和深度视觉特征聚合到一个统一的表示中。这三种类型的特征从不同的粒度描述食品图像。因此,聚合特征可以以最大概率捕获食品图像的语义。为此,我们利用额外的成分知识通过成分监督 CNN 获得中层属性表示。高层语义特征和深度视觉特征是从类别监督 CNN 中提取的。考虑到在许多情况下食品图像没有独特的空间布局,MSMVFA 融合了每种类型的特征的多尺度 CNN 激活,以使聚合特征更具辨别力,不受几何变形的影响。最后,通过两级融合(即每种类型的特征的多尺度融合和不同类型的特征的多视图聚合),聚合特征更具鲁棒性、全面性和辨别力。此外,MSMVFA 具有通用性,可以轻松地将不同的深度网络应用于该方案。广泛的实验和评估表明,我们的方法在三个流行的大型食品基准数据集的 Top-1 识别精度方面达到了最新的识别性能。此外,我们希望本文将进一步推动图像处理和计算机视觉领域的食品识别研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验