Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa Street 75, 00-662 Warsaw, Poland.
WeSub, Adama Branickiego Street 17, 02-972 Warsaw, Poland.
Sensors (Basel). 2023 Feb 21;23(5):2381. doi: 10.3390/s23052381.
Data processing in robotics is currently challenged by the effective building of multimodal and common representations. Tremendous volumes of raw data are available and their smart management is the core concept of multimodal learning in a new paradigm for data fusion. Although several techniques for building multimodal representations have been proven successful, they have not yet been analyzed and compared in a given production setting. This paper explored three of the most common techniques, (1) the late fusion, (2) the early fusion, and (3) the sketch, and compared them in classification tasks. Our paper explored different types of data (modalities) that could be gathered by sensors serving a wide range of sensor applications. Our experiments were conducted on Amazon Reviews, MovieLens25M, and Movie-Lens1M datasets. Their outcomes allowed us to confirm that the choice of fusion technique for building multimodal representation is crucial to obtain the highest possible model performance resulting from the proper modality combination. Consequently, we designed criteria for choosing this optimal data fusion technique.
机器人的数据处理目前面临着有效构建多模态和通用表示的挑战。大量的原始数据可用,其智能管理是数据融合新范例中多模态学习的核心概念。尽管已经证明了几种构建多模态表示的技术是成功的,但它们尚未在给定的生产环境中进行分析和比较。本文探讨了三种最常见的技术,(1)晚期融合,(2)早期融合,和(3)草图,并在分类任务中对它们进行了比较。我们的论文探讨了可以由服务于各种传感器应用的传感器收集的不同类型的数据(模态)。我们的实验是在亚马逊评论、MovieLens25M 和 Movie-Lens1M 数据集上进行的。它们的结果使我们能够确认,选择融合技术来构建多模态表示对于获得最佳的模型性能是至关重要的,这是来自于适当的模态组合。因此,我们设计了选择这种最优数据融合技术的标准。