Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.
Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing University of Technology, Beijing 100124, China.
Comput Math Methods Med. 2020 Aug 7;2020:6029258. doi: 10.1155/2020/6029258. eCollection 2020.
Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.
从数字舌图像中准确提取舌体是自动化舌诊的一个挑战,因为舌体的边缘模糊、病理细节的干扰以及舌的大小和形状差异巨大。在这项研究中,提出了一种使用具有编解码器结构的增强型全卷积网络的自动化舌图像分割方法。在所提出的网络框架中,采用深度残差网络作为编码器以获得密集特征图,并在编码器后面组装了一个接收场块。接收场块可以通过其多分支卷积层的结构捕获足够的全局上下文先验,这些卷积层的核大小不同。此外,特征金字塔网络被用作解码器,用于融合多尺度特征图,以获取足够的位置信息来恢复舌体的清晰轮廓。在 SIPL-tongue 数据集的 300 张舌图像的分割结果的定量评估中,平均 Hausdorff 距离、平均对称平均表面距离、平均 Dice 相似系数、平均精度、平均灵敏度和平均特异性分别为 11.2963、3.4737、97.26%、95.66%、98.97%和 98.68%。与其他四种基于深度学习的分割方法(包括 SegNet、FCN、PSPNet 和 DeepLab v3+)相比,该方法取得了最佳性能。在 HIT-tongue 数据集上也得到了类似的结果。实验结果表明,该方法能够实现准确的舌图像分割,满足自动化舌诊的实际要求。