School of Computing, Faculty of Engineering and Physical Sciences, University of Leeds, Leeds, LS2 9JT, UK.
Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, OX3 7DQ, UK.
Sci Rep. 2024 Jan 23;14(1):2032. doi: 10.1038/s41598-024-52063-x.
Polyps are well-known cancer precursors identified by colonoscopy. However, variability in their size, appearance, and location makes the detection of polyps challenging. Moreover, colonoscopy surveillance and removal of polyps are highly operator-dependent procedures and occur in a highly complex organ topology. There exists a high missed detection rate and incomplete removal of colonic polyps. To assist in clinical procedures and reduce missed rates, automated methods for detecting and segmenting polyps using machine learning have been achieved in past years. However, the major drawback in most of these methods is their ability to generalise to out-of-sample unseen datasets from different centres, populations, modalities, and acquisition systems. To test this hypothesis rigorously, we, together with expert gastroenterologists, curated a multi-centre and multi-population dataset acquired from six different colonoscopy systems and challenged the computational expert teams to develop robust automated detection and segmentation methods in a crowd-sourcing Endoscopic computer vision challenge. This work put forward rigorous generalisability tests and assesses the usability of devised deep learning methods in dynamic and actual clinical colonoscopy procedures. We analyse the results of four top performing teams for the detection task and five top performing teams for the segmentation task. Our analyses demonstrate that the top-ranking teams concentrated mainly on accuracy over the real-time performance required for clinical applicability. We further dissect the devised methods and provide an experiment-based hypothesis that reveals the need for improved generalisability to tackle diversity present in multi-centre datasets and routine clinical procedures.
息肉是通过结肠镜检查发现的已知的癌症前体。然而,息肉的大小、外观和位置存在差异,这使得息肉的检测具有挑战性。此外,结肠镜监测和息肉切除高度依赖操作人员,并且发生在高度复杂的器官拓扑结构中。存在高漏诊率和结肠息肉切除不完全的情况。为了辅助临床操作并降低漏诊率,近年来已经使用机器学习实现了用于检测和分割息肉的自动化方法。然而,这些方法的主要缺点是它们将样本外数据从不同中心、人群、模态和采集系统推广的能力。为了严格检验这一假设,我们与专家胃肠病学家一起,从六个不同的结肠镜系统中整理了一个多中心和多人群数据集,并向计算专家团队提出了挑战,要求他们在众包内窥镜计算机视觉挑战赛中开发强大的自动化检测和分割方法。这项工作提出了严格的泛化测试,并评估了设计的深度学习方法在动态和实际临床结肠镜检查程序中的可用性。我们分析了检测任务中排名前四的团队和分割任务中排名前五的团队的结果。我们的分析表明,排名靠前的团队主要集中在准确性上,而不是实时性能,实时性能是临床应用所必需的。我们进一步剖析了设计的方法,并提出了一个基于实验的假设,该假设揭示了需要提高泛化能力以解决多中心数据集和常规临床程序中存在的多样性。