Perez Soler Javier, Guardiola Jose-Luis, García Sastre Nicolás, Garrigues Carbó Pau, Sanchis Hernández Miguel, Perez-Cortes Juan-Carlos
Instituto Tecnológico de Informática (ITI), C. Nicolás Copérnico, 7, 46022 Valencia, Spain.
Departamento de Informática de Sistemas y Computadores (DISCA), Universitat Politècnica de València (UPV), 46022 Valencia, Spain.
Sensors (Basel). 2025 Jul 2;25(13):4127. doi: 10.3390/s25134127.
Multi-view classification (MVC) typically focuses on categorizing objects into distinct classes by employing multiple perspectives of the same objects. However, in numerous real-world applications, such as industrial inspection and quality control, there is an increasing need to distinguish particular objects from a pool of similar ones while simultaneously disregarding unknown objects. In these scenarios, relying on a single image may not provide sufficient information to effectively identify the scrutinized object, as different perspectives may reveal distinct characteristics that are essential for accurate classification. Most existing approaches operate within closed-set environments and are focused on generalization, which makes them less effective in distinguishing individual objects from others. This limitations are particularly problematic in industrial quality assessment, where distinguishing between specific objects and discarding unknowns is crucial. To address this challenge, we introduce a View-Compatible Feature Fusion (VCFF) method that utilizes images from predetermined positions as an accurate solution for multi-view classification of specific objects. Unlike other approaches, VCFF explicitly integrates pose information during the fusion process. It does not merely use pose as auxiliary data but employs it to align and selectively fuse features from different views. This mathematically explicit fusion of rotations, based on relative poses, allows VCFF to effectively combine multi-view information, enhancing classification accuracy. Through experimental evaluations, we demonstrate that the proposed VCFF method outperforms state-of-the-art MVC algorithms, especially in open-set scenarios, where the set of possible objects is not fully known in advance. Remarkably, VCFF achieves an average precision of 1.0 using only 8 cameras, whereas existing methods require 20 cameras to reach a maximum of 0.95. In terms of AUC-ROC under the constraint of fewer than 3σ false positives-a critical metric in industrial inspection-current state-of-the-art methods achieve up to 0.72, while VCFF attains a perfect score of 1.0 with just eight cameras. Furthermore, our approach delivers highly accurate rotation estimation, maintaining an error margin slightly above 2° when sampling at 4° intervals.
多视图分类(MVC)通常专注于通过采用同一物体的多个视角将物体分类到不同类别中。然而,在许多实际应用中,如工业检测和质量控制,越来越需要从一组相似物体中区分出特定物体,同时忽略未知物体。在这些场景中,仅依靠单一图像可能无法提供足够信息来有效识别被检查物体,因为不同视角可能会揭示对准确分类至关重要的不同特征。大多数现有方法在封闭集环境中运行,专注于泛化,这使得它们在将单个物体与其他物体区分开来时效果较差。这种局限性在工业质量评估中尤其成问题,因为区分特定物体和丢弃未知物体至关重要。为应对这一挑战,我们引入了一种视图兼容特征融合(VCFF)方法,该方法利用来自预定位置的图像作为特定物体多视图分类的准确解决方案。与其他方法不同,VCFF在融合过程中明确整合姿态信息。它不仅仅将姿态用作辅助数据,而是利用它来对齐并选择性地融合来自不同视图的特征。基于相对姿态的这种旋转的数学上明确的融合,使VCFF能够有效地组合多视图信息,提高分类精度。通过实验评估,我们证明所提出的VCFF方法优于现有最先进的MVC算法,特别是在开放集场景中,在这种场景下可能物体的集合事先并不完全已知。值得注意的是,VCFF仅使用8个摄像头就能达到1.0的平均精度,而现有方法需要20个摄像头才能达到最高0.95的精度。在工业检测的关键指标——误报率低于3σ的约束下的AUC-ROC方面,现有最先进的方法最高达到0.72,而VCFF仅用8个摄像头就能达到完美的1.0分数。此外,我们的方法提供了高度准确的旋转估计,以4°间隔采样时误差 margin略高于2°。