Wang Yujing, Abd Rahman Abdul Hadi, Nor Rashid Fadilla 'Atyka, Razali Mohamad Khairulamirin Md
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia.
Faculty of Physics and Electrical and Electronic Engineering, Aba Teachers University, Wenchuan 623002, China.
Sensors (Basel). 2024 Dec 9;24(23):7855. doi: 10.3390/s24237855.
Object detection is an essential computer vision task that identifies and locates objects within images or videos and is crucial for applications such as autonomous driving, robotics, and augmented reality. Light Detection and Ranging (LiDAR) and camera sensors are widely used for reliable object detection. These sensors produce heterogeneous data due to differences in data format, spatial resolution, and environmental responsiveness. Existing review articles on object detection predominantly focus on the statistical analysis of fusion algorithms, often overlooking the complexities of aligning data from these distinct modalities, especially dynamic environment data alignment. This paper addresses the challenges of heterogeneous LiDAR-camera alignment in dynamic environments by surveying over 20 alignment methods for three-dimensional (3D) object detection, focusing on research published between 2019 and 2024. This study introduces the core concepts of multimodal 3D object detection, emphasizing the importance of integrating data from different sensor modalities for accurate object recognition in dynamic environments. The survey then delves into a detailed comparison of recent heterogeneous alignment methods, analyzing critical approaches found in the literature, and identifying their strengths and limitations. A classification of methods for aligning heterogeneous data in 3D object detection is presented. This paper also highlights the critical challenges in aligning multimodal data, including dynamic environments, sensor fusion, scalability, and real-time processing. These limitations are thoroughly discussed, and potential future research directions are proposed to address current gaps and advance the state-of-the-art. By summarizing the latest advancements and highlighting open challenges, this survey aims to stimulate further research and innovation in heterogeneous alignment methods for multimodal 3D object detection, thereby pushing the boundaries of what is currently achievable in this rapidly evolving domain.
目标检测是一项重要的计算机视觉任务,用于识别和定位图像或视频中的物体,对于自动驾驶、机器人技术和增强现实等应用至关重要。激光雷达(LiDAR)和摄像头传感器被广泛用于可靠的目标检测。由于数据格式、空间分辨率和环境响应性的差异,这些传感器产生异构数据。现有的关于目标检测的综述文章主要集中在融合算法的统计分析上,常常忽略了对齐来自这些不同模态的数据的复杂性,尤其是动态环境数据对齐。本文通过调研20多种用于三维(3D)目标检测的对齐方法,探讨了动态环境中异构激光雷达与摄像头对齐的挑战,重点关注2019年至2024年发表的研究。本研究介绍了多模态3D目标检测的核心概念,强调了在动态环境中整合来自不同传感器模态的数据以进行准确目标识别的重要性。该综述随后深入比较了近期的异构对齐方法,分析了文献中发现的关键方法,并确定了它们的优点和局限性。提出了一种3D目标检测中异构数据对齐方法的分类。本文还强调了对齐多模态数据时的关键挑战,包括动态环境、传感器融合、可扩展性和实时处理。对这些局限性进行了深入讨论,并提出了潜在的未来研究方向,以弥补当前的差距并推动技术发展。通过总结最新进展并突出开放挑战,本综述旨在激发多模态3D目标检测异构对齐方法的进一步研究和创新,从而突破这个快速发展领域目前的可实现范围。