Suppr超能文献

NeRF-OR:用于从稀疏视图RGB-D视频重建手术室场景的神经辐射场

NeRF-OR: neural radiance fields for operating room scene reconstruction from sparse-view RGB-D videos.

作者信息

Gerats Beerend G A, Wolterink Jelmer M, Broeders Ivo A M J

机构信息

AI & Data Science Center, Meander Medical Center, Amersfoort, The Netherlands.

Department of Robotics and Mechatronics, University of Twente, Enschede, The Netherlands.

出版信息

Int J Comput Assist Radiol Surg. 2025 Jan;20(1):147-156. doi: 10.1007/s11548-024-03261-5. Epub 2024 Sep 13.

Abstract

PURPOSE

RGB-D cameras in the operating room (OR) provide synchronized views of complex surgical scenes. Assimilation of this multi-view data into a unified representation allows for downstream tasks such as object detection and tracking, pose estimation, and action recognition. Neural radiance fields (NeRFs) can provide continuous representations of complex scenes with limited memory footprint. However, existing NeRF methods perform poorly in real-world OR settings, where a small set of cameras capture the room from entirely different vantage points. In this work, we propose NeRF-OR, a method for 3D reconstruction of dynamic surgical scenes in the OR.

METHODS

Where other methods for sparse-view datasets use either time-of-flight sensor depth or dense depth estimated from color images, NeRF-OR uses a combination of both. The depth estimations mitigate the missing values that occur in sensor depth images due to reflective materials and object boundaries. We propose to supervise with surface normals calculated from the estimated depths, because these are largely scale invariant.

RESULTS

We fit NeRF-OR to static surgical scenes in the 4D-OR dataset and show that its representations are geometrically accurate, where state of the art collapses to sub-optimal solutions. Compared to earlier work, NeRF-OR grasps fine scene details while training 30 faster. Additionally, NeRF-OR can capture whole-surgery videos while synthesizing views at intermediate time values with an average PSNR of 24.86 dB. Last, we find that our approach has merit in sparse-view settings beyond those in the OR, by benchmarking on the NVS-RGBD dataset that contains as few as three training views. NeRF-OR synthesizes images with a PSNR of 26.72 dB, a 1.7% improvement over state of the art.

CONCLUSION

Our results show that NeRF-OR allows for novel view synthesis with videos captured by a small number of cameras with entirely different vantage points, which is the typical camera setting in the OR. Code is available via: github.com/Beerend/NeRF-OR .

摘要

目的

手术室(OR)中的RGB-D相机可提供复杂手术场景的同步视图。将这种多视图数据整合为统一表示形式,可用于诸如目标检测与跟踪、姿态估计和动作识别等下游任务。神经辐射场(NeRFs)可以用有限的内存占用提供复杂场景的连续表示。然而,现有的NeRF方法在实际手术室环境中表现不佳,在这种环境中,一小部分相机从完全不同的视角捕捉房间。在这项工作中,我们提出了NeRF-OR,一种用于手术室动态手术场景三维重建的方法。

方法

其他用于稀疏视图数据集的方法要么使用飞行时间传感器深度,要么使用从彩色图像估计的密集深度,而NeRF-OR则结合使用两者。深度估计减轻了由于反射材料和物体边界而在传感器深度图像中出现的缺失值。我们建议用从估计深度计算出的表面法线进行监督,因为这些法线在很大程度上是尺度不变的。

结果

我们将NeRF-OR应用于4D-OR数据集中的静态手术场景,结果表明其表示在几何上是准确的,而现有技术会陷入次优解。与早期工作相比,NeRF-OR在训练速度快30倍的同时能够捕捉精细的场景细节。此外,NeRF-OR可以捕捉整个手术视频,同时在中间时间值合成视图,平均PSNR为24.86dB。最后,通过在包含少至三个训练视图的NVS-RGBD数据集上进行基准测试,我们发现我们的方法在手术室之外的稀疏视图设置中也有优点。NeRF-OR合成图像的PSNR为26.72dB,比现有技术提高了1.7%。

结论

我们的结果表明,NeRF-OR允许使用由少数具有完全不同视角的相机拍摄的视频进行新颖视图合成,这是手术室中的典型相机设置。代码可通过以下链接获取:github.com/Beerend/NeRF-OR 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c403/11758168/4cc148f02187/11548_2024_3261_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验