基于神经网络的使用RGBD摄像头进行门把手识别的算法。

Neural network-based algorithm for door handle recognition using RGBD cameras.

作者信息

Mochurad Lesia, Hladun Yaroslav

机构信息

Lviv Polytechnic National University, Lviv, 79013, Ukraine.

出版信息

Sci Rep. 2024 Jul 9;14(1):15759. doi: 10.1038/s41598-024-66864-7.

DOI:10.1038/s41598-024-66864-7

PMID:38977922

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11231249/

Abstract

The ability to recognize and interact with a variety of doorknob designs is an important component on the path to true robot adaptability, allowing robotic systems to effectively interact with a variety of environments and objects The problem addressed in this paper is to develop and implement a method for recognizing the position of a door handle by a robot using data from an RGBD camera. To achieve this goal, we propose a revolutionary approach designed for autonomous robots that allows them to identify and manipulate door handles in different environments using data obtained from RGBD cameras. This was achieved by creating and annotating a complete dataset consisting of 5000 images of door handles from different angles, with the coordinates of the vertices of the bounding rectangles labeled. The architectural basis of the proposed approach is based on MobileNetV2, combined with a special decoder that optimally increases the resolution to 448 pixels. A new activation function specially designed for this neural network is implemented to ensure increased accuracy and efficiency of raw data processing. The most important achievement of this study is the model's ability to work in real-time, processing up to 16 images per second. This research paves the way for new advancements in the fields of robotics and computer vision, making a substantial contribution to the practical deployment of autonomous robots in a myriad of life's spheres.

摘要

识别各种门把手设计并与之交互的能力是实现真正的机器人适应性的重要组成部分，它能让机器人系统有效地与各种环境和物体进行交互。本文所解决的问题是开发并实现一种方法，使机器人能够利用来自RGB-D相机的数据识别门把手的位置。为实现这一目标，我们为自主机器人提出了一种革命性的方法，该方法能让机器人利用从RGB-D相机获取的数据，在不同环境中识别并操作门把手。这是通过创建并标注一个完整的数据集来实现的，该数据集包含5000张从不同角度拍摄的门把手图像，并标注了边界矩形顶点的坐标。所提出方法的架构基础基于MobileNetV2，并结合了一个特殊的解码器，该解码器可将分辨率最佳地提高到448像素。为这个神经网络专门设计了一种新的激活函数，以确保提高原始数据处理的准确性和效率。这项研究最重要的成果是该模型能够实时工作，每秒可处理多达16张图像。这项研究为机器人技术和计算机视觉领域的新进展铺平了道路，为自主机器人在众多生活领域的实际应用做出了重大贡献。