Suppr超能文献

TOSD:一种集成形状、颜色和拓扑结构的分层对象中心描述符。

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology.

作者信息

Choi Jun-Hyeon, Pyo Jeong-Won, An Ye-Chan, Kuc Tae-Yong

机构信息

Department of Electrical and Computer Engineering, College of Information and Communication Engineering, Sungkyunkwan University, Suwon 16419, Republic of Korea.

R&D Center, DXR Co., Ltd., Seoul 01411, Republic of Korea.

出版信息

Sensors (Basel). 2025 Jul 25;25(15):4614. doi: 10.3390/s25154614.

Abstract

This paper introduces a hierarchical object-centric descriptor framework called TOSD (Triplet Object-Centric Semantic Descriptor). The goal of this method is to overcome the limitations of existing pixel-based and global feature embedding approaches. To this end, the framework adopts a hierarchical representation that is explicitly designed for multi-level reasoning. TOSD combines shape, color, and topological information without depending on predefined class labels. The shape descriptor captures the geometric configuration of each object. The color descriptor focuses on internal appearance by extracting normalized color features. The topology descriptor models the spatial and semantic relationships between objects in a scene. These components are integrated at both object and scene levels to produce compact and consistent embeddings. The resulting representation covers three levels of abstraction: low-level pixel details, mid-level object features, and high-level semantic structure. This hierarchical organization makes it possible to represent both local cues and global context in a unified form. We evaluate the proposed method on multiple vision tasks. The results show that TOSD performs competitively compared to baseline methods, while maintaining robustness in challenging cases such as occlusion and viewpoint changes. The framework is applicable to visual odometry, SLAM, object tracking, global localization, scene clustering, and image retrieval. In addition, this work extends our previous research on the , which represents environments using layered structures of places, objects, and their ontological relations.

摘要

本文介绍了一种名为TOSD(三元组以对象为中心的语义描述符)的分层以对象为中心的描述符框架。该方法的目标是克服现有基于像素和全局特征嵌入方法的局限性。为此,该框架采用了一种专门为多级推理设计的分层表示。TOSD结合了形状、颜色和拓扑信息,而不依赖于预定义的类标签。形状描述符捕获每个对象的几何配置。颜色描述符通过提取归一化颜色特征来关注内部外观。拓扑描述符对场景中对象之间的空间和语义关系进行建模。这些组件在对象和场景级别都进行了集成,以产生紧凑且一致的嵌入。所得表示涵盖三个抽象级别:低级像素细节、中级对象特征和高级语义结构。这种分层组织使得能够以统一的形式表示局部线索和全局上下文。我们在多个视觉任务上评估了所提出的方法。结果表明,与基线方法相比,TOSD具有竞争力,同时在遮挡和视点变化等具有挑战性的情况下保持稳健性。该框架适用于视觉里程计、同步定位与地图构建(SLAM)、对象跟踪、全局定位、场景聚类和图像检索。此外,这项工作扩展了我们之前关于 的研究,该研究使用地点、对象及其本体关系的分层结构来表示环境。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c41f/12349392/ef0ca5316712/sensors-25-04614-g001.jpg

相似文献

1
TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology.
Sensors (Basel). 2025 Jul 25;25(15):4614. doi: 10.3390/s25154614.
4
Integrated neural network framework for multi-object detection and recognition using UAV imagery.
Front Neurorobot. 2025 Jul 30;19:1643011. doi: 10.3389/fnbot.2025.1643011. eCollection 2025.
6
Psychological interventions for adults who have sexually offended or are at risk of offending.
Cochrane Database Syst Rev. 2012 Dec 12;12(12):CD007507. doi: 10.1002/14651858.CD007507.pub2.
9
General 3D Vision-Language Model With Fast Rendering and Pre-Training Vision-Language Alignment.
IEEE Trans Pattern Anal Mach Intell. 2025 Sep;47(9):7352-7368. doi: 10.1109/TPAMI.2025.3566593.
10
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.
Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

本文引用的文献

1
Hierarchical in-out fusion for incomplete multimodal brain tumor segmentation.
Sci Rep. 2025 Jul 2;15(1):23017. doi: 10.1038/s41598-025-07466-9.
2
Multimodal and multiscale feature fusion for weakly supervised video anomaly detection.
Sci Rep. 2024 Oct 1;14(1):22835. doi: 10.1038/s41598-024-73462-0.
4
Frequency-Aware Feature Fusion for Dense Image Prediction.
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10763-10780. doi: 10.1109/TPAMI.2024.3449959. Epub 2024 Nov 6.
5
A review of deep learning-based information fusion techniques for multimodal medical image classification.
Comput Biol Med. 2024 Jul;177:108635. doi: 10.1016/j.compbiomed.2024.108635. Epub 2024 May 22.
6
GAT TransPruning: progressive channel pruning strategy combining graph attention network and transformer.
PeerJ Comput Sci. 2024 Apr 23;10:e2012. doi: 10.7717/peerj-cs.2012. eCollection 2024.
8
Automatic Modulation Classification Based on CNN-Transformer Graph Neural Network.
Sensors (Basel). 2023 Aug 20;23(16):7281. doi: 10.3390/s23167281.
9
Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection.
Entropy (Basel). 2022 Jul 6;24(7):939. doi: 10.3390/e24070939.
10
Multi-Scale Attention 3D Convolutional Network for Multimodal Gesture Recognition.
Sensors (Basel). 2022 Mar 21;22(6):2405. doi: 10.3390/s22062405.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验