Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:3547-3552. doi: 10.1109/EMBC46164.2021.9630156.
Gastroendoscopy has been a clinical standard for diagnosing and treating conditions that affect a part of a patient's digestive system, such as the stomach. Despite the fact that gastroendoscopy has a lot of advantages for patients, there exist some challenges for practitioners, such as the lack of 3D perception, including the depth and the endoscope pose information. Such challenges make navigating the endoscope and localizing any found lesion in a digestive tract difficult. To tackle these problems, deep learning-based approaches have been proposed to provide monocular gastroendoscopy with additional yet important depth and pose information. In this paper, we propose a novel supervised approach to train depth and pose estimation networks using consecutive endoscopy images to assist the endoscope navigation in the stomach. We firstly generate real depth and pose training data using our previously proposed whole stomach 3D reconstruction pipeline to avoid poor generalization ability between computer-generated (CG) models and real data for the stomach. In addition, we propose a novel generalized photometric loss function to avoid the complicated process of finding proper weights for balancing the depth and the pose loss terms, which is required for existing direct depth and pose supervision approaches. We then experimentally show that our proposed generalized loss performs better than existing direct supervision losses.
胃肠内窥镜检查已成为诊断和治疗影响患者部分消化系统疾病(如胃)的临床标准。尽管胃肠内窥镜检查对患者有很多优势,但对于从业者来说,仍存在一些挑战,例如缺乏 3D 感知,包括深度和内窥镜姿势信息。这些挑战使得在内窥镜导航和定位消化道中的任何发现的病变变得困难。为了解决这些问题,已经提出了基于深度学习的方法,为单目胃肠内窥镜提供额外但重要的深度和姿势信息。在本文中,我们提出了一种新的有监督方法,使用连续的内窥镜图像来训练深度和姿势估计网络,以协助胃内的内窥镜导航。我们首先使用我们之前提出的整个胃 3D 重建管道生成真实的深度和姿势训练数据,以避免胃的计算机生成 (CG) 模型和真实数据之间的泛化能力差。此外,我们提出了一种新的广义光度损失函数,以避免为平衡深度和姿势损失项找到适当权重的复杂过程,这是现有直接深度和姿势监督方法所需要的。然后,我们通过实验表明,我们提出的广义损失函数优于现有的直接监督损失函数。