Kodandaram Satwik Ram, Uckun Utku, Bi Xiaojun, Ramakrishnan I V, Ashok Vikas
Department of Computer Science, Stony Brook University, United States.
Computer Science, Stony Brook University, United States.
ASSETS. 2024;2024. doi: 10.1145/3663548.3675605. Epub 2024 Oct 27.
Blind individuals, who by necessity depend on screen readers to interact with computers, face considerable challenges in navigating the diverse and complex graphical user interfaces of different computer applications. The heterogeneity of various application interfaces often requires blind users to remember different keyboard combinations and navigation methods to use each application effectively. To alleviate this significant interaction burden imposed by heterogeneous application interfaces, we present Savant, a novel assistive technology powered by large language models (LLMs) that allows blind screen reader users to interact uniformly with any application interface through natural language. Novelly, Savant can automate a series of tedious screen reader actions on the control elements of the application when prompted by a natural language command from the user. These commands can be flexible in the sense that the user is not strictly required to specify the exact names of the control elements in the command. A user study evaluation of Savant with 11 blind participants demonstrated significant improvements in interaction efficiency and usability compared to current practices.
盲人在与计算机交互时必须依靠屏幕阅读器,他们在操作不同计算机应用程序的多样且复杂的图形用户界面时面临着巨大挑战。各种应用程序界面的异质性常常要求盲人用户记住不同的键盘组合和导航方法,以便有效地使用每个应用程序。为了减轻异构应用程序界面带来的这一巨大交互负担,我们推出了Savant,这是一种由大语言模型(LLMs)驱动的新型辅助技术,它允许盲人屏幕阅读器用户通过自然语言与任何应用程序界面进行统一交互。新颖的是,当用户发出自然语言命令时,Savant可以自动对应用程序的控制元素执行一系列繁琐的屏幕阅读器操作。这些命令具有灵活性,因为用户在命令中不严格要求指定控制元素的确切名称。一项针对11名盲人参与者的Savant用户研究评估表明,与当前做法相比,交互效率和可用性有了显著提高。