Choi Jun-Hyeon, Pyo Jeong-Won, An Ye-Chan, Kuc Tae-Yong
Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.
R&D Center, DXR Co., Ltd., Seoul 01411, Republic of Korea.
Sensors (Basel). 2025 Jul 25;25(15):4614. doi: 10.3390/s25154614.
This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the , which represents environments using layered structures of places, objects, and their ontological relations.
本文介绍了一种名为TOSD(三元组以对象为中心的语义描述符)的分层以对象为中心的描述符框架。该方法的目标是克服现有基于像素和全局特征嵌入方法的局限性。为此,该框架采用了一种专门为多级推理设计的分层表示。TOSD结合了形状、颜色和拓扑信息,而不依赖于预定义的类标签。形状描述符捕获每个对象的几何配置。颜色描述符通过提取归一化颜色特征来关注内部外观。拓扑描述符对场景中对象之间的空间和语义关系进行建模。这些组件在对象和场景级别都进行了集成,以产生紧凑且一致的嵌入。所得表示涵盖三个抽象级别:低级像素细节、中级对象特征和高级语义结构。这种分层组织使得能够以统一的形式表示局部线索和全局上下文。我们在多个视觉任务上评估了所提出的方法。结果表明,与基线方法相比,TOSD具有竞争力,同时在遮挡和视点变化等具有挑战性的情况下保持稳健性。该框架适用于视觉里程计、同步定位与地图构建(SLAM)、对象跟踪、全局定位、场景聚类和图像检索。此外,这项工作扩展了我们之前关于 的研究,该研究使用地点、对象及其本体关系的分层结构来表示环境。