改进的 yolov5 算法结合深度相机和嵌入式系统用于盲人室内视觉辅助。

Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistance.

机构信息

School of Computer Science and Technology, Huaibei Normal University, 235000, Huaibei, China.

Anhui Engineering Research Center for Intelligent Computing and Application on Cognitive Behavior, 235000, Huaibei, China.

出版信息

Sci Rep. 2024 Oct 3;14(1):23000. doi: 10.1038/s41598-024-74416-2.

DOI:10.1038/s41598-024-74416-2

PMID:39362920

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11452198/

Abstract

To assist the visually impaired in their daily lives and solve the problems associated with poor portability, high hardware costs, and environmental susceptibility of indoor object-finding aids for the visually impaired, an improved YOLOv5 algorithm was proposed. It was combined with a RealSense D435i depth camera and a voice system to realise an indoor object-finding device for the visually impaired using a Raspberry Pi 4 B device as its core. The algorithm uses GhostNet instead of the YOLOv5s backbone network to reduce the number of parameters and computation of the model, incorporates an attention mechanism (coordinate attention), and replaces the YOLOv5 neck network with a bidirectional feature pyramid network to enhance feature extraction. Compared to the YOLOv5 model, the model size was reduced by 42.4%, number of parameters was reduced by 47.9%, and recall rate increased by 1.2% with the same precision. This study applied the improved YOLOv5 algorithm to an indoor object-finding device for the visually impaired, where the searched object was input by voice, and the RealSense D435i was used to acquire RGB and depth images to realize the detection and ranging of the object, broadcast the specific distance of the target object by voice, and assist the visually impaired in finding the object.

摘要

为了帮助视障人士在日常生活中解决因室内寻物辅助设备便携性差、硬件成本高、易受环境影响等问题，提出了一种改进的 YOLOv5 算法。该算法结合了 RealSense D435i 深度相机和语音系统，使用 Raspberry Pi 4 B 设备作为核心，实现了一款用于视障人士的室内寻物装置。该算法使用 GhostNet 代替 YOLOv5s 骨干网络，减少模型的参数量和计算量，引入注意力机制（坐标注意力），并用双向特征金字塔网络替换 YOLOv5 颈部网络，增强特征提取能力。与 YOLOv5 模型相比，该模型的大小减小了 42.4%，参数量减少了 47.9%，在保持相同精度的情况下召回率提高了 1.2%。本研究将改进的 YOLOv5 算法应用于室内寻物装置中，视障人士通过语音输入搜索对象，使用 RealSense D435i 获取 RGB 和深度图像，实现对象的检测和测距，通过语音播报目标物体的具体距离，辅助视障人士寻找物体。