用于语义食品分割的轻量级深度可分离卷积网络v3+

Lightweight DeepLabv3+ for Semantic Food Segmentation.

作者信息

Muñoz Bastián, Martínez-Arroyo Angela, Acevedo Constanza, Aguilar Eduardo

机构信息

Departamento de Ingeniería y Sistemas de Computación, Universidad Católica del Norte, Av. Angamos 0610, Antofagasta 1270709, Chile.

Centro de Micro-Bioinnovación (CMBi), Escuela de Nutrición y Dietética, Facultad de Farmacia, Universidad de Valparaíso, Valparaíso 2360102, Chile.

出版信息

Foods. 2025 Apr 9;14(8):1306. doi: 10.3390/foods14081306.

DOI:10.3390/foods14081306

PMID:40282708

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12026278/

Abstract

Advancements in artificial intelligence, particularly in computer vision, have driven the research and development of visual food analysis systems focused primarily on enhancing people's well-being. Food analysis can be performed at various levels of granularity, with food segmentation being a major component of numerous real-world applications. Deep learning-based methodologies have demonstrated promising results in food segmentation; however, many of these approaches demand high computational resources, making them impractical for low-performance devices. In this research, a novel, lightweight, deep learning-based method for semantic food segmentation is proposed. To achieve this, the state-of-the-art DeepLabv3+ model was adapted by optimizing the backbone with the lightweight network EfficientNet-B1, replacing the Atrous Spatial Pyramid Pooling (ASPP) in the neck with Cascade Waterfall ASPP (CWASPP), and refining the encoder output using the squeeze-and-excitation attention mechanism. To validate the method, four publicly available food datasets were selected. Additionally, a new food segmentation dataset consisting of self-acquired food images was introduced and included in the validation. The results demonstrate that high performance can be achieved at a significantly lower cost. The proposed method yields results that are either better than or comparable to those of state-of-the-art techniques while requiring significantly lower computational costs. In conclusion, this research demonstrates the potential of deep learning to perform food image segmentation on low-performance stand-alone devices, paving the way for more efficient, cost-effective, and scalable food analysis applications.

摘要

人工智能的进步，特别是在计算机视觉方面的进步，推动了主要专注于提升人们福祉的视觉食品分析系统的研发。食品分析可以在不同粒度级别上进行，食品分割是众多实际应用的主要组成部分。基于深度学习的方法在食品分割方面已显示出有前景的结果；然而，这些方法中的许多都需要高计算资源，这使得它们对于低性能设备不切实际。在本研究中，提出了一种新颖的、基于深度学习的轻量级语义食品分割方法。为此，通过使用轻量级网络EfficientNet-B1优化主干、用级联瀑布空洞空间金字塔池化（CWASPP）替换颈部的空洞空间金字塔池化（ASPP）以及使用挤压激励注意力机制细化编码器输出，对先进的DeepLabv3+模型进行了改进。为了验证该方法，选择了四个公开可用的食品数据集。此外，还引入了一个由自行采集的食品图像组成的新食品分割数据集并将其纳入验证。结果表明，可以以显著更低的成本实现高性能。所提出的方法产生的结果优于或与现有技术相当，同时需要显著更低的计算成本。总之，本研究证明了深度学习在低性能独立设备上进行食品图像分割的潜力，为更高效、经济高效且可扩展的食品分析应用铺平了道路。