Xie Jingyi, Yu Rui, Zhang H E, Billah Syed Masum, Lee Sooyeon, Carroll John M
Pennsylvania State University, USA.
University of Louisville, USA.
Proc SIGCHI Conf Hum Factor Comput Syst. 2025 Apr-May;25. doi: 10.1145/3706598.3714210. Epub 2025 Apr 25.
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond basic usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users and analysis of image descriptions from both participants and social media using Be My AI (an LMM-based application), we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.
大型多模态模型(LMMs)催生了新的人工智能驱动应用程序,这些应用程序可帮助视力障碍者(PVI)通过语音文本获得周围环境的自然语言描述。我们研究了这种新兴的视觉辅助范式如何改变视力障碍者执行和管理日常任务的方式。除了基本的可用性评估,我们还考察了基于大型多模态模型的工具在个人和社交场景中的能力与局限性,同时探索其未来发展的设计启示。通过对14名视力障碍用户的访谈,以及使用“成为我的人工智能”(一款基于大型多模态模型的应用程序)对参与者和社交媒体的图像描述进行分析,我们确定了两个关键局限性。第一,这些系统的情境感知存在对社会情境、风格和人类身份的幻觉和误解。第二,它们的意图导向能力常常无法理解用户意图并据此采取行动。基于这些发现,我们提出了改善人机交互和人工智能间交互的设计策略,为开发更有效、交互式和个性化的辅助技术做出贡献。