Bobrow Taylor L, Golhar Mayank, Vijayan Rohan, Akshintala Venkata S, Garcia Juan R, Durr Nicholas J
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.
Division of Gastroenterology and Hepatology, Johns Hopkins Medicine, Baltimore, MD 21287, USA.
Med Image Anal. 2023 Dec;90:102956. doi: 10.1016/j.media.2023.102956. Epub 2023 Sep 7.
Screening colonoscopy is an important clinical application for several 3D computer vision techniques, including depth estimation, surface reconstruction, and missing region detection. However, the development, evaluation, and comparison of these techniques in real colonoscopy videos remain largely qualitative due to the difficulty of acquiring ground truth data. In this work, we present a Colonoscopy 3D Video Dataset (C3VD) acquired with a high definition clinical colonoscope and high-fidelity colon models for benchmarking computer vision methods in colonoscopy. We introduce a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model. The different modalities are registered by transforming optical images to depth maps with a Generative Adversarial Network and aligning edge features with an evolutionary optimizer. This registration method achieves an average translation error of 0.321 millimeters and an average rotation error of 0.159 degrees in simulation experiments where error-free ground truth is available. The method also leverages video information, improving registration accuracy by 55.6% for translation and 60.4% for rotation compared to single frame registration. 22 short video sequences were registered to generate 10,015 total frames with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes screening videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. The dataset and registration source code are available at https://durr.jhu.edu/C3VD.
筛查结肠镜检查是多种3D计算机视觉技术的重要临床应用,包括深度估计、表面重建和缺失区域检测。然而,由于获取地面真值数据困难,这些技术在实际结肠镜检查视频中的开发、评估和比较在很大程度上仍停留在定性阶段。在这项工作中,我们展示了一个结肠镜检查3D视频数据集(C3VD),它是使用高清临床结肠镜和高保真结肠模型获取的,用于对结肠镜检查中的计算机视觉方法进行基准测试。我们引入了一种新颖的多模态2D-3D配准技术,将光学视频序列与已知3D模型的地面真值渲染视图进行配准。通过生成对抗网络将光学图像转换为深度图,并使用进化优化器对齐边缘特征,从而实现不同模态的配准。在可获得无误差地面真值的模拟实验中,这种配准方法实现了平均平移误差0.321毫米和平均旋转误差0.159度。该方法还利用了视频信息,与单帧配准相比,平移配准精度提高了55.6%,旋转配准精度提高了60.4%。对22个短视频序列进行配准,生成了总共10015帧,同时提供了配对的地面真值深度、表面法线、光流、遮挡、六自由度姿态、覆盖图和3D模型。该数据集还包括由胃肠病学家采集的筛查视频,以及配对的地面真值姿态和3D表面模型。该数据集和配准源代码可在https://durr.jhu.edu/C3VD获取。