VIIDA和InViDe：为视障人士生成和评估包容性图像段落的计算方法。

VIIDA and InViDe: computational approaches for generating and evaluating inclusive image paragraphs for the visually impaired.

作者信息

Fernandes Daniel L, Ribeiro Marcos H F, Silva Michel M, Cerqueira Fabio R

机构信息

Department of Informatics, Universidade Federal de Viçosa - UFV, Viçosa, Brazil.

Department of Production Engineering, Universidade Federal Fluminense - UFF, Petrópolis, Brazil.

出版信息

Disabil Rehabil Assist Technol. 2025 Jul;20(5):1470-1495. doi: 10.1080/17483107.2024.2437567. Epub 2024 Dec 11.

DOI:10.1080/17483107.2024.2437567

PMID:39661561

Abstract

BACKGROUND

Existing image description methods when used as Assistive Technologies often fall short in meeting the needs of blind or low vision (BLV) individuals. They tend to either compress all visual elements into brief captions, create disjointed sentences for each image region, or provide extensive descriptions.

PURPOSE

To address these limitations, we introduce VIIDA, a procedure aimed at the Visually Impaired which implements an Image Description Approach, focusing on webinar scenes. We also propose InViDe, an Inclusive Visual Description metric, a novel approach for evaluating image descriptions targeting BLV people.

METHODS

We reviewed existing methods and developed VIIDA by integrating a multimodal Visual Question Answering model with Natural Language Processing (NLP) filters. A scene graph-based algorithm was then applied to structure final paragraphs. By employing NLP tools, InViDe conducts a multicriteria analysis based on accessibility standards and guidelines.

RESULTS

Experiments statistically demonstrate that VIIDA generates descriptions closely aligned with image content as well as human-written linguistic features, and that suit BLV needs. InViDe offers valuable insights into the behaviour of the compared methods - among them, state-of-the-art methods based on Large Language Models - across diverse criteria.

CONCLUSION

VIIDA and InViDe emerge as efficient Assistive Technologies, combining Artificial Intelligence models and computational/mathematical techniques to generate and evaluate image descriptions for the visually impaired with low computational costs. This work is anticipated to inspire further research and application development in the domain of Assistive Technologies. Our codes are publicly available at: https://github.com/daniellf/VIIDA-and-InViDe.

摘要

背景

现有的图像描述方法在用作辅助技术时，往往无法满足盲人或低视力（BLV）个体的需求。它们要么将所有视觉元素压缩成简短的字幕，为每个图像区域创建不连贯的句子，要么提供冗长的描述。

目的

为了解决这些局限性，我们引入了VIIDA，这是一种针对视障人士的程序，它实现了一种图像描述方法，重点关注网络研讨会场景。我们还提出了InViDe，一种包容性视觉描述指标，这是一种评估针对BLV人群的图像描述的新方法。

方法

我们回顾了现有方法，并通过将多模态视觉问答模型与自然语言处理（NLP）过滤器集成来开发VIIDA。然后应用基于场景图的算法来构建最终段落。通过使用NLP工具，InViDe根据无障碍标准和指南进行多标准分析。

结果

实验从统计学上证明，VIIDA生成的描述与图像内容以及人工编写的语言特征紧密对齐，并且适合BLV的需求。InViDe为比较方法（包括基于大语言模型的现有技术方法）在不同标准下的表现提供了有价值的见解。

结论

VIIDA和InViDe成为高效的辅助技术，结合人工智能模型和计算/数学技术，以低计算成本为视障人士生成和评估图像描述。这项工作有望激发辅助技术领域的进一步研究和应用开发。我们的代码可在以下网址公开获取：https://github.com/daniellf/VIIDA-and-InViDe。

相似文献

VIIDA and InViDe: computational approaches for generating and evaluating inclusive image paragraphs for the visually impaired.VIIDA和InViDe：为视障人士生成和评估包容性图像段落的计算方法。

Disabil Rehabil Assist Technol. 2025 Jul;20(5):1470-1495. doi: 10.1080/17483107.2024.2437567. Epub 2024 Dec 11.

Reading aids for adults with low vision.针对视力低下成年人的阅读辅助工具。

Cochrane Database Syst Rev. 2018 Apr 17;4(4):CD003303. doi: 10.1002/14651858.CD003303.pub4.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Computer and mobile technology interventions for self-management in chronic obstructive pulmonary disease.用于慢性阻塞性肺疾病自我管理的计算机和移动技术干预措施。

Cochrane Database Syst Rev. 2017 May 23;5(5):CD011425. doi: 10.1002/14651858.CD011425.pub2.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.在自闭症群体的参与下为成年自闭症患者调整安全计划。

Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.

Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。

Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Integrating computer vision algorithms and RFID system for identification and tracking of group-housed animals: an example with pigs.整合计算机视觉算法和射频识别系统用于群居动物的识别与跟踪：以猪为例。

J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae174.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

引用本文的文献

Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews.将人工智能与辅助技术整合于医疗保健领域：基于综述之综述的见解

Healthcare (Basel). 2025 Mar 4;13(5):556. doi: 10.3390/healthcare13050556.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

VIIDA和InViDe：为视障人士生成和评估包容性图像段落的计算方法。

VIIDA and InViDe: computational approaches for generating and evaluating inclusive image paragraphs for the visually impaired.

作者信息

机构信息

出版信息

BACKGROUND

PURPOSE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献