• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于提示的多模态景观美景评估与视觉语言模型。

Prompt-guided and multimodal landscape scenicness assessments with vision-language models.

机构信息

Laboratory of Geo-Information Science and Remote Sensing, Wageningen University, Wageningen, the Netherlands.

Instituut voor Milieuvraagstukken, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.

出版信息

PLoS One. 2024 Sep 30;19(9):e0307083. doi: 10.1371/journal.pone.0307083. eCollection 2024.

DOI:10.1371/journal.pone.0307083
PMID:39348404
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11441650/
Abstract

Recent advances in deep learning and Vision-Language Models (VLM) have enabled efficient transfer to downstream tasks even when limited labelled training data is available, as well as for text to be directly compared to image content. These properties of VLMs enable new opportunities for the annotation and analysis of images. We test the potential of VLMs for landscape scenicness prediction, i.e., the aesthetic quality of a landscape, using zero- and few-shot methods. We experiment with few-shot learning by fine-tuning a single linear layer on a pre-trained VLM representation. We find that a model fitted to just a few hundred samples performs favourably compared to a model trained on hundreds of thousands of examples in a fully supervised way. We also explore the zero-shot prediction potential of contrastive prompting using positive and negative landscape aesthetic concepts. Our results show that this method outperforms a linear probe with few-shot learning when using a small number of samples to tune the prompt configuration. We introduce Landscape Prompt Ensembling (LPE), which is an annotation method for acquiring landscape scenicness ratings through rated text descriptions without needing an image dataset during annotation. We demonstrate that LPE can provide landscape scenicness assessments that are concordant with a dataset of image ratings. The success of zero- and few-shot methods combined with their ability to use text-based annotations highlights the potential for VLMs to provide efficient landscape scenicness assessments with greater flexibility.

摘要

深度学习和视觉语言模型 (VLM) 的最新进展使得即使在可用的有限标记训练数据的情况下,也能够有效地转移到下游任务,并且可以直接将文本与图像内容进行比较。VLMs 的这些特性为图像的注释和分析提供了新的机会。我们使用零样本和少样本方法测试 VLM 在景观美景预测(即景观的美学质量)方面的潜力。我们通过在预训练的 VLM 表示上微调单个线性层来进行少样本学习实验。我们发现,与在完全监督方式下使用数十万示例训练的模型相比,仅拟合几百个示例的模型表现良好。我们还探索了使用正负面景观美学概念的对比提示进行零样本预测的潜力。我们的结果表明,当使用少量样本调整提示配置时,这种方法在使用少数样本进行微调时,比具有少样本学习的线性探针表现更好。我们引入了景观提示集成 (LPE),这是一种通过带有评分的文本描述来获取景观美景评分的注释方法,在注释过程中不需要图像数据集。我们证明了 LPE 可以提供与图像评分数据集一致的景观美景评估。零样本和少样本方法的成功以及它们能够使用基于文本的注释的能力突出了 VLM 提供更灵活的高效景观美景评估的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/0e1195225d06/pone.0307083.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/77c70e527fea/pone.0307083.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/a4261a861655/pone.0307083.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/c27a0b3ea4fb/pone.0307083.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/e1bc570eb315/pone.0307083.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/c7fe35fabb33/pone.0307083.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/f691eb233791/pone.0307083.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/84f0482b9aa7/pone.0307083.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/50c180128121/pone.0307083.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/0e1195225d06/pone.0307083.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/77c70e527fea/pone.0307083.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/a4261a861655/pone.0307083.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/c27a0b3ea4fb/pone.0307083.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/e1bc570eb315/pone.0307083.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/c7fe35fabb33/pone.0307083.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/f691eb233791/pone.0307083.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/84f0482b9aa7/pone.0307083.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/50c180128121/pone.0307083.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1b84/11441650/0e1195225d06/pone.0307083.g009.jpg

相似文献

1
Prompt-guided and multimodal landscape scenicness assessments with vision-language models.基于提示的多模态景观美景评估与视觉语言模型。
PLoS One. 2024 Sep 30;19(9):e0307083. doi: 10.1371/journal.pone.0307083. eCollection 2024.
2
Zero-shot prompt-based video encoder for surgical gesture recognition.用于手术手势识别的基于零样本提示的视频编码器
Int J Comput Assist Radiol Surg. 2025 Feb;20(2):311-321. doi: 10.1007/s11548-024-03257-1. Epub 2024 Sep 17.
3
Vision-Language Models for Vision Tasks: A Survey.用于视觉任务的视觉语言模型:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2024 Aug;46(8):5625-5644. doi: 10.1109/TPAMI.2024.3369699. Epub 2024 Jul 2.
4
Adapting Vision-Language Models via Learning to Inject Knowledge.通过学习注入知识来适配视觉语言模型。
IEEE Trans Image Process. 2024;33:5798-5809. doi: 10.1109/TIP.2024.3468884. Epub 2024 Oct 15.
5
Fine-Grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.用于开放词汇目标检测的细粒度视觉文本提示驱动自训练
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16277-16287. doi: 10.1109/TNNLS.2023.3293484. Epub 2024 Oct 29.
6
Proto-Adapter: Efficient Training-Free CLIP-Adapter for Few-Shot Image Classification.Proto-Adapter:用于少样本图像分类的高效无需训练的CLIP-Adapter
Sensors (Basel). 2024 Jun 4;24(11):3624. doi: 10.3390/s24113624.
7
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders.通过微调预训练的图像-文本编码器,显著提高零样本 X 射线病理学分类。
Sci Rep. 2024 Oct 5;14(1):23199. doi: 10.1038/s41598-024-73695-z.
8
A veracity dissemination consistency-based few-shot fake news detection framework by synergizing adversarial and contrastive self-supervised learning.一种基于真实性传播一致性的少样本假新闻检测框架,通过协同对抗性和对比性自监督学习实现。
Sci Rep. 2024 Aug 22;14(1):19470. doi: 10.1038/s41598-024-70039-9.
9
Few-Shot Image Classification of Crop Diseases Based on Vision-Language Models.基于视觉-语言模型的作物病害少样本图像分类。
Sensors (Basel). 2024 Sep 21;24(18):6109. doi: 10.3390/s24186109.
10
Comparing a Large Language Model with Previous Deep Learning Models on Named Entity Recognition of Adverse Drug Events.比较大型语言模型与先前深度学习模型在药物不良反应命名实体识别上的表现。
Stud Health Technol Inform. 2024 Aug 22;316:781-785. doi: 10.3233/SHTI240528.

本文引用的文献

1
CLIP knows image aesthetics.CLIP了解图像美学。
Front Artif Intell. 2022 Nov 25;5:976235. doi: 10.3389/frai.2022.976235. eCollection 2022.
2
Social media and deep learning capture the aesthetic quality of the landscape.社交媒体和深度学习捕捉到了景观的美学质量。
Sci Rep. 2021 Oct 8;11(1):20000. doi: 10.1038/s41598-021-99282-0.
3
The Effects of Neighborhood Built Environment on Walking for Leisure and for Purpose Among Older People.社区建成环境对老年人休闲和目的行走的影响。
Gerontologist. 2020 May 15;60(4):651-660. doi: 10.1093/geront/gnz093.
4
Happiness is Greater in More Scenic Locations.风景优美的地方会让人更快乐。
Sci Rep. 2019 Mar 14;9(1):4498. doi: 10.1038/s41598-019-40854-6.
5
Using deep learning to quantify the beauty of outdoor places.利用深度学习量化户外场所的美感。
R Soc Open Sci. 2017 Jul 19;4(7):170170. doi: 10.1098/rsos.170170. eCollection 2017 Jul.
6
Quantifying the Impact of Scenic Environments on Health.量化风景环境对健康的影响。
Sci Rep. 2015 Nov 25;5:16899. doi: 10.1038/srep16899.
7
Green space and stress: evidence from cortisol measures in deprived urban communities.绿色空间与压力:来自贫困城市社区皮质醇水平的证据。
Int J Environ Res Public Health. 2013 Sep 2;10(9):4086-103. doi: 10.3390/ijerph10094086.
8
Contributions of cultural services to the ecosystem services agenda.文化服务对生态系统服务议程的贡献。
Proc Natl Acad Sci U S A. 2012 Jun 5;109(23):8812-9. doi: 10.1073/pnas.1114773109. Epub 2012 May 21.
9
The cognitive benefits of interacting with nature.与自然互动对认知的益处。
Psychol Sci. 2008 Dec;19(12):1207-12. doi: 10.1111/j.1467-9280.2008.02225.x.