Institute of Medical Informatics, University of Lübeck, Ratzeburger Allee 160, 23562, Lübeck, Germany.
Drägerwerk AG & Co. KGaA, Moislinger Allee 53-55, 23558, Lübeck, Germany.
Int J Comput Assist Radiol Surg. 2019 Nov;14(11):1871-1879. doi: 10.1007/s11548-019-02044-7. Epub 2019 Aug 6.
For many years, deep convolutional neural networks have achieved state-of-the-art results on a wide variety of computer vision tasks. 3D human pose estimation makes no exception and results on public benchmarks are impressive. However, specialized domains, such as operating rooms, pose additional challenges. Clinical settings include severe occlusions, clutter and difficult lighting conditions. Privacy concerns of patients and staff make it necessary to use unidentifiable data. In this work, we aim to bring robust human pose estimation to the clinical domain.
We propose a 2D-3D information fusion framework that makes use of a network of multiple depth cameras and strong pose priors. In a first step, probabilities of 2D joints are predicted from single depth images. These information are fused in a shared voxel space yielding a rough estimate of the 3D pose. Final joint positions are obtained by regressing into the latent pose space of a pre-trained convolutional autoencoder.
We evaluate our approach against several baselines on the challenging MVOR dataset. Best results are obtained when fusing 2D information from multiple views and constraining the predictions with learned pose priors.
We present a robust 3D human pose estimation framework based on a multi-depth camera network in the operating room. Depth images as only input modalities make our approach especially interesting for clinical applications due to the given anonymity for patients and staff.
多年来,深度卷积神经网络在各种计算机视觉任务上取得了最先进的成果。3D 人体姿态估计也不例外,在公共基准测试上的结果令人印象深刻。然而,特殊领域,如手术室,带来了额外的挑战。临床环境包括严重的遮挡、杂乱和困难的照明条件。患者和工作人员的隐私问题使得使用无法识别的数据成为必要。在这项工作中,我们旨在将强大的人体姿态估计引入临床领域。
我们提出了一种 2D-3D 信息融合框架,该框架利用了多个深度摄像机网络和强大的姿态先验。在第一步中,从单个深度图像预测 2D 关节的概率。这些信息在共享体素空间中融合,得出 3D 姿态的粗略估计。最终的关节位置通过回归到预先训练的卷积自动编码器的潜在姿态空间来获得。
我们在具有挑战性的 MVOR 数据集上对几种基线方法进行了评估。当融合来自多个视图的 2D 信息并利用学习到的姿态先验来约束预测时,会得到最佳结果。
我们提出了一种基于手术室中多深度摄像机网络的强大 3D 人体姿态估计框架。由于患者和工作人员的匿名性,深度图像作为唯一的输入模态,使我们的方法特别适用于临床应用。