基于显著目标排序的视听认知优化策略，用于智能视觉假体系统。

School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, People's Republic of China.

Graduate School of Biomedical Engineering, UNSW, Sydney, NSW 2052, Australia.

J Neural Eng. 2024 Nov 29;21(6). doi: 10.1088/1741-2552/ad94a4.

Visual prostheses are effective tools for restoring vision, yet real-world complexities pose ongoing challenges. The progress in AI has led to the emergence of the concept of intelligent visual prosthetics with auditory support, leveraging deep learning to create practical artificial vision perception beyond merely restoring natural sight for the blind.This study introduces an object-based attention mechanism that simulates human gaze points when observing the external world to descriptions of physical regions. By transforming this mechanism into a ranking problem of salient entity regions, we introduce prior visual attention cues to build a new salient object ranking (SaOR) dataset, and propose a SaOR network aimed at providing depth perception for prosthetic vision. Furthermore, we propose a SaOR-guided image description method to align with human observation patterns, toward providing additional visual information by auditory feedback. Finally, the integration of the two aforementioned algorithms constitutes an audiovisual cognitive optimization strategy for prosthetic vision.Through conducting psychophysical experiments based on scene description tasks under simulated prosthetic vision, we verify that the SaOR method improves the subjects' performance in terms of object identification and understanding the correlation among objects. Additionally, the cognitive optimization strategy incorporating image description further enhances their prosthetic visual cognition.This offers valuable technical insights for designing next-generation intelligent visual prostheses and establishes a theoretical groundwork for developing their visual information processing strategies. Code will be made publicly available.

视觉假体是恢复视力的有效工具，但现实世界的复杂性带来了持续的挑战。人工智能的进步催生了具有听觉支持的智能视觉假体的概念，利用深度学习为盲人创造超越恢复自然视力的实用人工视觉感知。

本研究引入了一种基于对象的注意力机制，模拟人类观察外部世界时对物理区域的描述的注视点。通过将该机制转化为显著实体区域的排序问题，我们引入了先前的视觉注意力提示来构建新的显著对象排序（SaOR）数据集，并提出了一种旨在为假体视觉提供深度感知的 SaOR 网络。此外，我们提出了一种 SaOR 引导的图像描述方法，以与人类观察模式保持一致，通过听觉反馈提供额外的视觉信息。最后，将上述两种算法集成到假体视觉的视听认知优化策略中。

通过在模拟假体视觉下进行基于场景描述任务的心理物理实验，我们验证了 SaOR 方法在提高对象识别和理解对象之间相关性方面的性能。此外，结合图像描述的认知优化策略进一步增强了他们的假体视觉认知。这为设计下一代智能视觉假体提供了有价值的技术见解，并为开发其视觉信息处理策略奠定了理论基础。代码将公开提供。