Suppr超能文献

在基于深度视觉词袋模型中使用预先训练的深度学习模型作为特征提取器是否总能提高图像分类准确率?

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?

机构信息

School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China.

出版信息

PLoS One. 2024 Feb 29;19(2):e0298228. doi: 10.1371/journal.pone.0298228. eCollection 2024.

Abstract

This article investigates whether higher classification accuracy can always be achieved by utilizing a pre-trained deep learning model as the feature extractor in the Bag-of-Deep-Visual-Words (BoDVW) classification model, as opposed to directly using the new classification layer of the pre-trained model for classification. Considering the multiple factors related to the feature extractor -such as model architecture, fine-tuning strategy, number of training samples, feature extraction method, and feature encoding method-we investigate these factors through experiments and then provide detailed answers to the question. In our experiments, we use five feature encoding methods: hard-voting, soft-voting, locally constrained linear coding, super vector coding, and fisher vector (FV). We also employ two popular feature extraction methods: one (denoted as Ext-DFs(CP)) uses a convolutional or non-global pooling layer, and another (denoted as Ext-DFs(FC)) uses a fully-connected or global pooling layer. Three pre-trained models-VGGNet-16, ResNext-50(32×4d), and Swin-B-are utilized as feature extractors. Experimental results on six datasets (15-Scenes, TF-Flowers, MIT Indoor-67, COVID-19 CXR, NWPU-RESISC45, and Caltech-101) reveal that compared to using the pre-trained model with only the new classification layer re-trained for classification, employing it as the feature extractor in the BoDVW model improves the accuracy in 35 out of 36 experiments when using FV. With Ext-DFs(CP), the accuracy increases by 0.13% to 8.43% (averaged at 3.11%), and with Ext-DFs(FC), it increases by 1.06% to 14.63% (averaged at 5.66%). Furthermore, when all layers of the pre-trained model are fine-tuned and used as the feature extractor, the results vary depending on the methods used. If FV and Ext-DFs(FC) are used, the accuracy increases by 0.21% to 5.65% (averaged at 1.58%) in 14 out of 18 experiments. Our results suggest that while using a pre-trained deep learning model as the feature extractor does not always improve classification accuracy, it holds great potential as an accuracy improvement technique.

摘要

本文探讨了在 Bag-of-Deep-Visual-Words (BoDVW) 分类模型中,使用预训练的深度学习模型作为特征提取器是否总能获得更高的分类准确性,而不是直接使用预训练模型的新分类层进行分类。考虑到特征提取器相关的多个因素 - 例如模型架构、微调策略、训练样本数量、特征提取方法和特征编码方法 - 我们通过实验研究了这些因素,并提供了对该问题的详细答案。在实验中,我们使用了五种特征编码方法:硬投票、软投票、局部约束线性编码、超级向量编码和 fisher 向量(FV)。我们还采用了两种流行的特征提取方法:一种(表示为 Ext-DFs(CP))使用卷积或非全局池化层,另一种(表示为 Ext-DFs(FC))使用全连接或全局池化层。我们使用了三个预训练模型 - VGGNet-16、ResNext-50(32×4d) 和 Swin-B - 作为特征提取器。在六个数据集(15-Scenes、TF-Flowers、MIT Indoor-67、COVID-19 CXR、NWPU-RESISC45 和 Caltech-101)上的实验结果表明,与仅重新训练预训练模型的新分类层进行分类相比,将其用作 BoDVW 模型的特征提取器,在使用 FV 时,在 36 次实验中有 35 次提高了准确性。使用 Ext-DFs(CP),准确性提高了 0.13% 到 8.43%(平均提高 3.11%),使用 Ext-DFs(FC),准确性提高了 1.06% 到 14.63%(平均提高 5.66%)。此外,当微调预训练模型的所有层并将其用作特征提取器时,结果因所使用的方法而异。如果使用 FV 和 Ext-DFs(FC),在 18 次实验中有 14 次提高了 0.21% 到 5.65%(平均提高 1.58%)的准确性。我们的结果表明,虽然使用预训练的深度学习模型作为特征提取器并不总是能提高分类准确性,但它作为一种提高准确性的技术具有很大的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82cb/10903886/a18bd18fbbbb/pone.0298228.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验