Suppr超能文献

从多个基础模型中提取知识以实现零样本图像分类。

Distilling knowledge from multiple foundation models for zero-shot image classification.

机构信息

School of Computer Science and Technology, Shandong University of Science and Technology, Qingdao, Shandong, China.

出版信息

PLoS One. 2024 Sep 20;19(9):e0310730. doi: 10.1371/journal.pone.0310730. eCollection 2024.

Abstract

Zero-shot image classification enables the recognition of new categories without requiring additional training data, thereby enhancing the model's generalization capability when specific training are unavailable. This paper introduces a zero-shot image classification framework to recognize new categories that are unseen during training by distilling knowledge from foundation models. Specifically, we first employ ChatGPT and DALL-E to synthesize reference images of unseen categories from text prompts. Then, the test image is aligned with text and reference images using CLIP and DINO to calculate the logits. Finally, the predicted logits are aggregated according to their confidence to produce the final prediction. Experiments are conducted on multiple datasets, including MNIST, SVHN, CIFAR-10, CIFAR-100, and TinyImageNet. The results demonstrate that our method can significantly improve classification accuracy compared to previous approaches, achieving AUROC scores of over 96% across all test datasets. Our code is available at https://github.com/1134112149/MICW-ZIC.

摘要

零样本图像分类使识别新类别成为可能,而无需额外的训练数据,从而提高了模型在没有特定训练时的泛化能力。本文介绍了一种零样本图像分类框架,通过从基础模型中提取知识,识别训练中未见过的新类别。具体来说,我们首先使用 ChatGPT 和 DALL-E 根据文本提示从文本中合成看不见类别的参考图像。然后,使用 CLIP 和 DINO 将测试图像与文本和参考图像对齐,以计算日志。最后,根据置信度对预测的日志进行聚合,以生成最终预测。在多个数据集上进行了实验,包括 MNIST、SVHN、CIFAR-10、CIFAR-100 和 TinyImageNet。实验结果表明,与之前的方法相比,我们的方法可以显著提高分类准确性,在所有测试数据集上的 AUROC 得分均超过 96%。我们的代码可在 https://github.com/1134112149/MICW-ZIC 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1744/11414985/5cce6feb7da7/pone.0310730.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验