Interventional and Experimental Endoscopy (InExEn), Internal Medicine II, University Hospital Wuerzburg, Würzburg, Germany.
Artificial Intelligence and Knowledge Systems, Institute for Computer Science, Julius-Maximilians-Universität, Würzburg, Germany.
Scand J Gastroenterol. 2022 Nov;57(11):1397-1403. doi: 10.1080/00365521.2022.2085059. Epub 2022 Jun 14.
Computer-aided polyp detection (CADe) may become a standard for polyp detection during colonoscopy. Several systems are already commercially available. We report on a video-based benchmark technique for the first preclinical assessment of such systems before comparative randomized trials are to be undertaken. Additionally, we compare a commercially available CADe system with our newly developed one.
ENDOTEST consisted in the combination of two datasets. The validation dataset contained 48 video-snippets with 22,856 manually annotated images of which 53.2% contained polyps. The performance dataset contained 10 full-length screening colonoscopies with 230,898 manually annotated images of which 15.8% contained a polyp. Assessment parameters were accuracy for polyp detection and time delay to first polyp detection after polyp appearance (FDT). Two CADe systems were assessed: a commercial CADe system (GI-Genius, Medtronic), and a self-developed new system (ENDOMIND). The latter being a convolutional neuronal network trained on 194,983 manually labeled images extracted from colonoscopy videos recorded in mainly six different gastroenterologic practices.
On the ENDOTEST, both CADe systems detected all polyps in at least one image. The per-frame sensitivity and specificity in full colonoscopies was 48.1% and 93.7%, respectively for GI-Genius; and 54% and 92.7%, respectively for ENDOMIND. Median FDT of ENDOMIND with 217 ms (Inter-Quartile Range(IQR)8-1533) was significantly faster than GI-Genius with 1050 ms (IQR 358-2767, = 0.003).
Our benchmark ENDOTEST may be helpful for preclinical testing of new CADe devices. There seems to be a correlation between a shorter FDT with a higher sensitivity and a lower specificity for polyp detection.
计算机辅助息肉检测(CADe)可能成为结肠镜检查中息肉检测的标准。目前已有多个系统商业化。我们报告了一种基于视频的基准技术,用于在进行比较随机试验之前,对这些系统进行首次临床前评估。此外,我们还比较了一种市售的 CADe 系统和我们新开发的系统。
ENDOTEST 由两个数据集组成。验证数据集包含 48 个视频片段,其中包含 22856 张手动标注的图像,其中 53.2%包含息肉。性能数据集包含 10 次全结肠筛查结肠镜检查,其中包含 230898 张手动标注的图像,其中 15.8%包含息肉。评估参数为息肉检测的准确性和息肉出现后首次检测到息肉的时间延迟(FDT)。评估了两种 CADe 系统:一种商业 CADe 系统(GI-Genius,美敦力)和一种新开发的自系统(ENDOMIND)。后者是一种基于卷积神经网络的系统,在 194983 张从主要六个不同的胃肠病学实践中记录的结肠镜检查视频中提取的手动标记图像上进行了训练。
在 ENDOTEST 上,两种 CADe 系统都在至少一张图像中检测到了所有息肉。在全结肠镜检查中,GI-Genius 的每帧敏感性和特异性分别为 48.1%和 93.7%;而 ENDOMIND 分别为 54%和 92.7%。ENDOMIND 的中位数 FDT 为 217ms(IQR 8-1533),明显快于 GI-Genius 的 1050ms(IQR 358-2767, = 0.003)。
我们的基准 ENDOTEST 可能有助于新 CADe 设备的临床前测试。对于息肉检测,FDT 越短,敏感性越高,特异性越低。