在基于深度视觉词袋模型中使用预先训练的深度学习模型作为特征提取器是否总能提高图像分类准确率？

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?

机构信息

School of IoT Technology, Wuxi Institute of Technology, Wuxi, Jiangsu, China.

出版信息

PLoS One. 2024 Feb 29;19(2):e0298228. doi: 10.1371/journal.pone.0298228. eCollection 2024.

DOI:10.1371/journal.pone.0298228

PMID:38422007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10903886/

Abstract

This article investigates whether higher classification accuracy can always be achieved by utilizing a pre-trained deep learning model as the feature extractor in the Bag-of-Deep-Visual-Words (BoDVW) classification model, as opposed to directly using the new classification layer of the pre-trained model for classification. Considering the multiple factors related to the feature extractor -such as model architecture, fine-tuning strategy, number of training samples, feature extraction method, and feature encoding method-we investigate these factors through experiments and then provide detailed answers to the question. In our experiments, we use five feature encoding methods: hard-voting, soft-voting, locally constrained linear coding, super vector coding, and fisher vector (FV). We also employ two popular feature extraction methods: one (denoted as Ext-DFs(CP)) uses a convolutional or non-global pooling layer, and another (denoted as Ext-DFs(FC)) uses a fully-connected or global pooling layer. Three pre-trained models-VGGNet-16, ResNext-50(32×4d), and Swin-B-are utilized as feature extractors. Experimental results on six datasets (15-Scenes, TF-Flowers, MIT Indoor-67, COVID-19 CXR, NWPU-RESISC45, and Caltech-101) reveal that compared to using the pre-trained model with only the new classification layer re-trained for classification, employing it as the feature extractor in the BoDVW model improves the accuracy in 35 out of 36 experiments when using FV. With Ext-DFs(CP), the accuracy increases by 0.13% to 8.43% (averaged at 3.11%), and with Ext-DFs(FC), it increases by 1.06% to 14.63% (averaged at 5.66%). Furthermore, when all layers of the pre-trained model are fine-tuned and used as the feature extractor, the results vary depending on the methods used. If FV and Ext-DFs(FC) are used, the accuracy increases by 0.21% to 5.65% (averaged at 1.58%) in 14 out of 18 experiments. Our results suggest that while using a pre-trained deep learning model as the feature extractor does not always improve classification accuracy, it holds great potential as an accuracy improvement technique.

摘要

本文探讨了在 Bag-of-Deep-Visual-Words (BoDVW) 分类模型中，使用预训练的深度学习模型作为特征提取器是否总能获得更高的分类准确性，而不是直接使用预训练模型的新分类层进行分类。考虑到特征提取器相关的多个因素 - 例如模型架构、微调策略、训练样本数量、特征提取方法和特征编码方法 - 我们通过实验研究了这些因素，并提供了对该问题的详细答案。在实验中，我们使用了五种特征编码方法：硬投票、软投票、局部约束线性编码、超级向量编码和 fisher 向量（FV）。我们还采用了两种流行的特征提取方法：一种（表示为 Ext-DFs(CP)）使用卷积或非全局池化层，另一种（表示为 Ext-DFs(FC)）使用全连接或全局池化层。我们使用了三个预训练模型 - VGGNet-16、ResNext-50(32×4d) 和 Swin-B - 作为特征提取器。在六个数据集（15-Scenes、TF-Flowers、MIT Indoor-67、COVID-19 CXR、NWPU-RESISC45 和 Caltech-101）上的实验结果表明，与仅重新训练预训练模型的新分类层进行分类相比，将其用作 BoDVW 模型的特征提取器，在使用 FV 时，在 36 次实验中有 35 次提高了准确性。使用 Ext-DFs(CP)，准确性提高了 0.13% 到 8.43%（平均提高 3.11%），使用 Ext-DFs(FC)，准确性提高了 1.06% 到 14.63%（平均提高 5.66%）。此外，当微调预训练模型的所有层并将其用作特征提取器时，结果因所使用的方法而异。如果使用 FV 和 Ext-DFs(FC)，在 18 次实验中有 14 次提高了 0.21% 到 5.65%（平均提高 1.58%）的准确性。我们的结果表明，虽然使用预训练的深度学习模型作为特征提取器并不总是能提高分类准确性，但它作为一种提高准确性的技术具有很大的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/82cb/10903886/a18bd18fbbbb/pone.0298228.g001.jpg

相似文献

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?在基于深度视觉词袋模型中使用预先训练的深度学习模型作为特征提取器是否总能提高图像分类准确率？

PLoS One. 2024 Feb 29;19(2):e0298228. doi: 10.1371/journal.pone.0298228. eCollection 2024.

Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification?预训练的卷积神经网络能否直接用作基于视频的新生儿睡眠和觉醒分类的特征提取器？

BMC Res Notes. 2020 Nov 4;13(1):507. doi: 10.1186/s13104-020-05343-4.

New bag of deep visual words based features to classify chest x-ray images for COVID-19 diagnosis.用于COVID-19诊断的基于深度视觉词袋特征的新方法对胸部X光图像进行分类。

Health Inf Sci Syst. 2021 Jun 18;9(1):24. doi: 10.1007/s13755-021-00152-w. eCollection 2021 Dec.

Determining Top Fully Connected Layer's Hidden Neuron Count for Transfer Learning, Using Knowledge Distillation: a Case Study on Chest X-Ray Classification of Pneumonia and COVID-19.确定全连接层隐藏神经元数量用于迁移学习，使用知识蒸馏：以肺炎和 COVID-19 的胸部 X 射线分类为例。

J Digit Imaging. 2021 Dec;34(6):1349-1358. doi: 10.1007/s10278-021-00518-2. Epub 2021 Sep 29.

Automatic COVID-19 Detection Using Exemplar Hybrid Deep Features with X-ray Images.利用 X 射线图像的范例混合深度特征自动检测 COVID-19。

Int J Environ Res Public Health. 2021 Jul 29;18(15):8052. doi: 10.3390/ijerph18158052.

Fusion of multi-scale bag of deep visual words features of chest X-ray images to detect COVID-19 infection.融合胸部 X 射线图像的多尺度深度视觉词袋特征来检测 COVID-19 感染。

Sci Rep. 2021 Dec 13;11(1):23914. doi: 10.1038/s41598-021-03287-8.

Reduced Deep Convolutional Activation Features (R-DeCAF) in Histopathology Images to Improve the Classification Performance for Breast Cancer Diagnosis.降低组织病理学图像中的深度卷积激活特征 (R-DeCAF)，以提高乳腺癌诊断的分类性能。

J Digit Imaging. 2023 Dec;36(6):2602-2612. doi: 10.1007/s10278-023-00887-w. Epub 2023 Aug 2.

Improved transfer learning using textural features conflation and dynamically fine-tuned layers.利用纹理特征融合和动态微调层改进迁移学习。

PeerJ Comput Sci. 2023 Sep 28;9:e1601. doi: 10.7717/peerj-cs.1601. eCollection 2023.

Pooling region learning of visual word for image classification using bag-of-visual-words model.基于词袋模型的图像分类中视觉词的区域学习。

PLoS One. 2020 Jun 5;15(6):e0234144. doi: 10.1371/journal.pone.0234144. eCollection 2020.

Deep Learning for Feature Extraction in Remote Sensing: A Case-Study of Aerial Scene Classification.深度学习在遥感特征提取中的应用：以航空场景分类为例。

Sensors (Basel). 2020 Jul 14;20(14):3906. doi: 10.3390/s20143906.

本文引用的文献

Transfer learning for image classification using VGG19: Caltech-101 image data set.使用VGG19进行图像分类的迁移学习：加州理工学院101图像数据集。

J Ambient Intell Humaniz Comput. 2023;14(4):3609-3620. doi: 10.1007/s12652-021-03488-z. Epub 2021 Sep 17.

New bag of deep visual words based features to classify chest x-ray images for COVID-19 diagnosis.用于COVID-19诊断的基于深度视觉词袋特征的新方法对胸部X光图像进行分类。

Health Inf Sci Syst. 2021 Jun 18;9(1):24. doi: 10.1007/s13755-021-00152-w. eCollection 2021 Dec.

AutoTune: Automatically Tuning Convolutional Neural Networks for Improved Transfer Learning.AutoTune：自动调整卷积神经网络以提高迁移学习性能。

Neural Netw. 2021 Jan;133:112-122. doi: 10.1016/j.neunet.2020.10.009. Epub 2020 Oct 27.

CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images.CoroNet：一种用于从胸部 X 光图像中检测和诊断 COVID-19 的深度神经网络。

Comput Methods Programs Biomed. 2020 Nov;196:105581. doi: 10.1016/j.cmpb.2020.105581. Epub 2020 Jun 5.

Places: A 10 Million Image Database for Scene Recognition.地点：用于场景识别的 1000 万图像数据库。

IEEE Trans Pattern Anal Mach Intell. 2018 Jun;40(6):1452-1464. doi: 10.1109/TPAMI.2017.2723009. Epub 2017 Jul 4.

A Discriminative Representation of Convolutional Features for Indoor Scene Recognition.用于室内场景识别的卷积特征判别表示

IEEE Trans Image Process. 2016 Jul;25(7):3372-3383. doi: 10.1109/TIP.2016.2567076. Epub 2016 May 11.

Deep Filter Banks for Texture Recognition, Description, and Segmentation.用于纹理识别、描述和分割的深度滤波器组

Int J Comput Vis. 2016;118:65-94. doi: 10.1007/s11263-015-0872-3. Epub 2016 Jan 9.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN：基于区域建议网络的实时目标检测。

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

Feature coding in image classification: a comprehensive study.特征编码在图像分类中的应用：一项综合性研究。

IEEE Trans Pattern Anal Mach Intell. 2014 Mar;36(3):493-506. doi: 10.1109/TPAMI.2013.113.

Learning Category-Specific Dictionary and Shared Dictionary for Fine-Grained Image Categorization.学习类别特定字典和共享字典进行细粒度图像分类。

IEEE Trans Image Process. 2014 Feb;23(2):623-34. doi: 10.1109/TIP.2013.2290593. Epub 2013 Nov 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在基于深度视觉词袋模型中使用预先训练的深度学习模型作为特征提取器是否总能提高图像分类准确率？

Can using a pre-trained deep learning model as the feature extractor in the bag-of-deep-visual-words model always improve image classification accuracy?

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献