Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA.
Department of Nutrition Science, Purdue University, West Lafayette, IN 47907, USA.
Nutrients. 2023 Jun 15;15(12):2751. doi: 10.3390/nu15122751.
Food classification serves as the basic step of image-based dietary assessment to predict the types of foods in each input image. However, foods in real-world scenarios are typically long-tail distributed, where a small number of food types are consumed more frequently than others, which causes a severe class imbalance issue and hinders the overall performance. In addition, none of the existing long-tailed classification methods focus on food data, which can be more challenging due to the inter-class similarity and intra-class diversity between food images. In this work, two new benchmark datasets for long-tailed food classification are introduced, including Food101-LT and VFN-LT, where the number of samples in VFN-LT exhibits real-world long-tailed food distribution. Then, a novel two-phase framework is proposed to address the problem of class imbalance by (1) undersampling the head classes to remove redundant samples along with maintaining the learned information through knowledge distillation and (2) oversampling the tail classes by performing visually aware data augmentation. By comparing our method with existing state-of-the-art long-tailed classification methods, we show the effectiveness of the proposed framework, which obtains the best performance on both Food101-LT and VFN-LT datasets. The results demonstrate the potential to apply the proposed method to related real-life applications.
食物分类是基于图像的饮食评估的基本步骤,用于预测每个输入图像中的食物类型。然而,现实场景中的食物通常是长尾分布的,少数几种食物类型比其他食物更频繁地被消费,这导致了严重的类别不平衡问题,从而影响了整体性能。此外,现有的长尾分类方法都没有专门针对食物数据,这可能更具挑战性,因为食物图像之间存在类间相似度和类内多样性。在这项工作中,我们引入了两个用于长尾食物分类的新基准数据集,包括 Food101-LT 和 VFN-LT,其中 VFN-LT 的样本数量呈现出真实世界的长尾食物分布。然后,我们提出了一种新颖的两阶段框架来解决类别不平衡问题,方法是(1)对头部类进行欠采样以去除冗余样本,并通过知识蒸馏保留已学习的信息,以及(2)对尾部类进行过采样,通过执行视觉感知的数据增强。通过将我们的方法与现有的最先进的长尾分类方法进行比较,我们展示了所提出框架的有效性,该框架在 Food101-LT 和 VFN-LT 数据集上均取得了最佳性能。结果表明,该方法有可能应用于相关的实际生活应用。