Frisch Yannik, Sivakumar Ssharvien Kumar, Köksal Çağhan, Böhm Elsa, Wagner Felix, Gericke Adrian, Ghazaei Ghazal, Mukhopadhyay Anirban
TU Darmstadt, Fraunhoferstr. 5, 64297, Darmstadt, Germany.
Universitätsmedizin Mainz, Langenbeckstr. 1, 55131, Mainz, Germany.
Int J Comput Assist Radiol Surg. 2025 May 21. doi: 10.1007/s11548-025-03397-y.
Surgical simulation offers a promising addition to conventional surgical training. However, available simulation tools lack photorealism and rely on hard-coded behaviour. Denoising Diffusion Models are a promising alternative for high-fidelity image synthesis, but existing state-of-the-art conditioning methods fall short in providing precise control or interactivity over the generated scenes.
We introduce SurGrID, a Scene Graph to Image Diffusion Model, allowing for controllable surgical scene synthesis by leveraging Scene Graphs. These graphs encode a surgical scene's components' spatial and semantic information, which are then translated into an intermediate representation using our novel pre-training step that explicitly captures local and global information.
Our proposed method improves the fidelity of generated images and their coherence with the graph input over the state of the art. Further, we demonstrate the simulation's realism and controllability in a user assessment study involving clinical experts.
Scene Graphs can be effectively used for precise and interactive conditioning of Denoising Diffusion Models for simulating surgical scenes, enabling high-fidelity and interactive control over the generated content.
手术模拟为传统手术训练提供了一个很有前景的补充。然而,现有的模拟工具缺乏照片般的真实感,且依赖硬编码行为。去噪扩散模型是高保真图像合成的一个有前途的替代方案,但现有的最先进的条件方法在对生成的场景提供精确控制或交互性方面存在不足。
我们引入了SurGrID,一种从场景图到图像的扩散模型,通过利用场景图实现可控的手术场景合成。这些图编码了手术场景中组件的空间和语义信息,然后使用我们新颖的预训练步骤将其转换为中间表示,该步骤明确捕获局部和全局信息。
我们提出的方法提高了生成图像的保真度及其与图输入的一致性,优于现有技术。此外,我们在一项涉及临床专家的用户评估研究中展示了模拟的真实感和可控性。
场景图可有效地用于对去噪扩散模型进行精确和交互式的条件设定,以模拟手术场景,从而实现对生成内容的高保真和交互式控制。