Suppr超能文献

结肠镜检查中基于跨任务一致性的多任务学习以改善深度估计。

Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy.

机构信息

School of Computer Science, Faculty of Engineering and Physical Sciences, University of Leeds, Leeds, LS2 9JT, United Kingdom.

Department of Gastroenterology, Leeds Teaching Hospitals NHS Trust, Leeds, UK; Division of Gastroenterology and Surgical Sciences Leeds Institute of Medical Research at St James's University of Leeds, Leeds, UK.

出版信息

Med Image Anal. 2025 Jan;99:103379. doi: 10.1016/j.media.2024.103379. Epub 2024 Nov 4.

Abstract

Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on δ accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.

摘要

结肠镜筛查是评估结肠和直肠异常的金标准程序,例如溃疡和癌性息肉。测量异常黏膜面积及其 3D 重建可以帮助量化受检区域并客观评估疾病负担。然而,由于这些器官的拓扑结构复杂且物理条件变化,例如照明、大均匀纹理和图像模式估计距相机的距离(即深度)极具挑战性。此外,大多数结肠镜视频采集是单目,使得深度估计成为一个非平凡的问题。虽然计算机视觉中已经提出并在自然场景数据集上推进了用于深度估计的方法,但这些技术在结肠镜数据集上的功效尚未得到广泛量化。由于结肠黏膜有几个低纹理区域,特征不明显,因此从辅助任务中学习表示可以改善显著特征提取,从而可以估计准确的相机深度。在这项工作中,我们提出开发一种具有共享编码器和两个解码器的新的多任务学习 (MTL) 方法,即表面法线解码器和深度估计解码器。我们的深度估计器结合了注意力机制,以增强全局上下文感知能力。我们利用表面法线预测来改善几何特征提取。此外,我们在两个几何相关任务(表面法线和相机深度)之间应用了跨任务一致性损失。与最准确的基线最先进的 Big-to-Small (BTS) 方法相比,我们的方法在相对误差上提高了 15.75%,在δ准确性上提高了 10.7%。所有实验均在最近发布的 C3VD 数据集上进行,因此,我们在该数据集上提供了最先进方法的第一个基准。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验