Human Enhancement & Assistive Technology Research Section, Artificial Intelligence Research Lab., Electronics Telecommunications Research Institute (ETRI), Daejeon 34129, Korea.
Int J Environ Res Public Health. 2021 Jun 8;18(12):6216. doi: 10.3390/ijerph18126216.
Visual-auditory sensory substitution has demonstrated great potential to help visually impaired and blind groups to recognize objects and to perform basic navigational tasks. However, the high latency between visual information acquisition and auditory transduction may contribute to the lack of the successful adoption of such aid technologies in the blind community; thus far, substitution methods have remained only laboratory-scale research or pilot demonstrations. This high latency for data conversion leads to challenges in perceiving fast-moving objects or rapid environmental changes. To reduce this latency, prior analysis of auditory sensitivity is necessary. However, existing auditory sensitivity analyses are subjective because they were conducted using human behavioral analysis. Therefore, in this study, we propose a cross-modal generative adversarial network-based evaluation method to find an optimal auditory sensitivity to reduce transmission latency in visual-auditory sensory substitution, which is related to the perception of visual information. We further conducted a human-based assessment to evaluate the effectiveness of the proposed model-based analysis in human behavioral experiments. We conducted experiments with three participant groups, including sighted users (SU), congenitally blind (CB) and late-blind (LB) individuals. Experimental results from the proposed model showed that the temporal length of the auditory signal for sensory substitution could be reduced by 50%. This result indicates the possibility of improving the performance of the conventional vOICe method by up to two times. We confirmed that our experimental results are consistent with human assessment through behavioral experiments. Analyzing auditory sensitivity with deep learning models has the potential to improve the efficiency of sensory substitution.
视觉-听觉感觉替代在帮助视障和盲人识别物体和执行基本导航任务方面显示出巨大的潜力。然而,视觉信息获取和听觉转换之间的高延迟可能导致这种辅助技术在盲人社区中无法成功采用;到目前为止,替代方法仍然只是实验室规模的研究或试点演示。这种数据转换的高延迟导致对快速移动的物体或快速环境变化的感知困难。为了降低这种延迟,需要对听觉敏感度进行预先分析。然而,现有的听觉敏感度分析是主观的,因为它们是用人的行为分析进行的。因此,在这项研究中,我们提出了一种基于跨模态生成对抗网络的评估方法,以找到最佳的听觉敏感度,从而降低视觉-听觉感觉替代中的传输延迟,这与视觉信息的感知有关。我们进一步进行了基于人类的评估,以评估基于模型的分析在人类行为实验中的有效性。我们对三个参与者群体进行了实验,包括明眼人(SU)、先天性盲人(CB)和后天盲人(LB)。从所提出的模型中得到的实验结果表明,感觉替代的听觉信号的时间长度可以减少 50%。这一结果表明,通过使用传统的 vOICe 方法,可以将性能提高两倍。我们通过行为实验确认,我们的实验结果与人类评估一致。使用深度学习模型分析听觉敏感度有提高感觉替代效率的潜力。