de Maissin Astrid, Vallée Remi, Flamant Mathurin, Fondain-Bossiere Marie, Berre Catherine Le, Coutrot Antoine, Normand Nicolas, Mouchère Harold, Coudol Sandrine, Trang Caroline, Bourreille Arnaud
CHD La Roche Sur Yon, department of gastroenterology, La Roche Sur Yon, France.
Nantes University, CNRS, LS2N UMR 6004, Nantes, France.
Endosc Int Open. 2021 Jul;9(7):E1136-E1144. doi: 10.1055/a-1468-3964. Epub 2021 Jun 21.
Computer-aided diagnostic tools using deep neural networks are efficient for detection of lesions in endoscopy but require a huge number of images. The impact of the quality of annotation has not been tested yet. Here we describe a multi-expert annotated dataset of images extracted from capsules from Crohn's disease patients and the impact of the quality of annotations on the accuracy of a recurrent attention neural network. Images of capsule were annotated by a reader first and then reviewed by three experts in inflammatory bowel disease. Concordance analysis between experts was evaluated by Fleiss' kappa and all the discordant images were, again, read by all the endoscopists to obtain a consensus annotation. A recurrent attention neural network developed for the study was tested before and after the consensus annotation. Available neural networks (ResNet and VGGNet) were also tested under the same conditions. The final dataset included 3498 images with 2124 non-pathological (60.7 %), 1360 pathological (38.9 %), and 14 (0.4 %) inconclusive. Agreement of the experts was good for distinguishing pathological and non-pathological images with a kappa of 0.79 ( < 0.0001). The accuracy of our classifier and the available neural networks increased after the consensus annotation with a precision of 93.7 %, sensitivity of 93 %, and specificity of 95 %. The accuracy of the neural network increased with improved annotations, suggesting that the number of images needed for the development of these systems could be diminished using a well-designed dataset.
使用深度神经网络的计算机辅助诊断工具在内窥镜检查中检测病变很有效,但需要大量图像。注释质量的影响尚未得到测试。在这里,我们描述了一个从克罗恩病患者的胶囊中提取的图像的多专家注释数据集,以及注释质量对循环注意力神经网络准确性的影响。胶囊图像首先由一名读者进行注释,然后由三名炎症性肠病专家进行审核。通过Fleiss' kappa评估专家之间的一致性分析,所有不一致的图像再次由所有内镜医师阅读以获得共识注释。在达成共识注释之前和之后,对为该研究开发的循环注意力神经网络进行了测试。在相同条件下也测试了现有的神经网络(ResNet和VGGNet)。最终数据集包括3498张图像,其中2124张为非病理性图像(60.7%),1360张为病理性图像(38.9%),14张为不确定图像(0.4%)。专家们在区分病理性和非病理性图像方面的一致性良好,kappa值为0.79(<0.0001)。在达成共识注释后,我们的分类器和现有神经网络的准确性提高,精确率为93.7%,灵敏度为93%,特异性为95%。神经网络的准确性随着注释的改进而提高,这表明使用精心设计的数据集可以减少开发这些系统所需的图像数量。