Suppr超能文献

S²Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR.

作者信息

Pei Jialun, Guo Diandian, Zhang Jingyang, Lin Manxi, Jin Yueming, Heng Pheng-Ann

出版信息

IEEE Trans Med Imaging. 2025 Jan;44(1):361-372. doi: 10.1109/TMI.2024.3444279. Epub 2025 Jan 2.

Abstract

Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR). However, previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. This pipeline may potentially compromise the flexibility of learning multimodal representations, consequently constraining the overall effectiveness. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR, aimed to complementally leverage multi-view 2D scenes and 3D point clouds for SGG in an end-to-end manner. Concretely, our model embraces a View-Sync Transfusion scheme to encourage multi-view visual information interaction. Concurrently, a Geometry-Visual Cohesion operation is designed to integrate the synergic 2D semantic features into 3D point cloud features. Moreover, based on the augmented feature, we propose a novel relation-sensitive transformer decoder that embeds dynamic entity-pair queries and relational trait priors, which enables the direct prediction of entity-pair relations for graph generation without intermediate steps. Extensive experiments have validated the superior SGG performance and lower computational cost of S2Former-OR on 4D-OR benchmark, compared with current OR-SGG methods, e.g., 3 percentage points Precision increase and 24.2M reduction in model parameters. We further compared our method with generic single-stage SGG methods with broader metrics for a comprehensive evaluation, with consistently better performance achieved. Our source code can be made available at: https://github.com/PJLallen/S2Former-OR.

摘要

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验