Mo Henghui, Wei Linjing
College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730070, China.
Sensors (Basel). 2024 May 28;24(11):3480. doi: 10.3390/s24113480.
Considering the complex structure of Chinese characters, particularly the connections and intersections between strokes, there are challenges in low accuracy of Chinese character stroke extraction and recognition, as well as unclear segmentation. This study builds upon the YOLOv8n-seg model to propose the YOLOv8n-seg-CAA-BiFPN Chinese character stroke fine segmentation model. The proposed Coordinate-Aware Attention mechanism (CAA) divides the backbone network input feature map into four parts, applying different weights for horizontal, vertical, and channel attention to compute and fuse key information, thus capturing the contextual regularity of closely arranged stroke positions. The network's neck integrates an enhanced weighted bi-directional feature pyramid network (BiFPN), enhancing the fusion effect for features of strokes of various sizes. The Shape-IoU loss function is adopted in place of the traditional CIoU loss function, focusing on the shape and scale of stroke bounding boxes to optimize the bounding box regression process. Finally, the Grad-CAM++ technique is used to generate heatmaps of segmentation predictions, facilitating the visualization of effective features and a deeper understanding of the model's focus areas. Trained and tested on the public Chinese character stroke datasets CCSE-Kai and CCSE-HW, the model achieves an average accuracy of 84.71%, an average recall rate of 83.65%, and a mean average precision of 80.11%. Compared to the original YOLOv8n-seg and existing mainstream segmentation models like SegFormer, BiSeNetV2, and Mask R-CNN, the average accuracy improved by 3.50%, 4.35%, 10.56%, and 22.05%, respectively; the average recall rates improved by 4.42%, 9.32%, 15.64%, and 24.92%, respectively; and the mean average precision improved by 3.11%, 4.15%, 8.02%, and 19.33%, respectively. The results demonstrate that the YOLOv8n-seg-CAA-BiFPN network can accurately achieve Chinese character stroke segmentation.
考虑到汉字结构复杂,尤其是笔画之间的连接和交叉,汉字笔画提取与识别的准确率较低且分割不清晰,存在诸多挑战。本研究基于YOLOv8n-seg模型,提出了YOLOv8n-seg-CAA-BiFPN汉字笔画精细分割模型。所提出的坐标感知注意力机制(CAA)将骨干网络输入特征图划分为四个部分,对水平、垂直和通道注意力应用不同权重来计算和融合关键信息,从而捕捉紧密排列的笔画位置的上下文规律。网络的颈部集成了增强加权双向特征金字塔网络(BiFPN),增强了对各种大小笔画特征的融合效果。采用Shape-IoU损失函数代替传统的CIoU损失函数,专注于笔画边界框的形状和尺度,以优化边界框回归过程。最后,使用Grad-CAM++技术生成分割预测的热图,便于有效特征的可视化以及对模型关注区域的更深入理解。该模型在公开的汉字笔画数据集CCSE-Kai和CCSE-HW上进行训练和测试,平均准确率达到84.71%,平均召回率为83.65%,平均精度均值为80.11%。与原始的YOLOv8n-seg以及现有的主流分割模型如SegFormer、BiSeNetV2和Mask R-CNN相比,平均准确率分别提高了3.50%、4.35%、10.56%和22.05%;平均召回率分别提高了4.42%、9.32%、15.64%和24.92%;平均精度均值分别提高了3.11%、4.15%、8.02%和19.33%。结果表明,YOLOv8n-seg-CAA-BiFPN网络能够准确地实现汉字笔画分割。