Suppr超能文献

基于改进Mask R-CNN的数学函数图形元素检测与分割

Element detection and segmentation of mathematical function graphs based on improved Mask R-CNN.

作者信息

Lu Jiale, Chen Jianjun, Xu Taihua, Song Jingjing, Yang Xibei

机构信息

School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang 212100, Jiangsu, China.

出版信息

Math Biosci Eng. 2023 May 31;20(7):12772-12801. doi: 10.3934/mbe.2023570.

Abstract

There are approximately 2.2 billion people around the world with varying degrees of visual impairments. Among them, individuals with severe visual impairments predominantly rely on hearing and touch to gather external information. At present, there are limited reading materials for the visually impaired, mostly in the form of audio or text, which cannot satisfy the needs for the visually impaired to comprehend graphical content. Although many scholars have devoted their efforts to investigating methods for converting visual images into tactile graphics, tactile graphic translation fails to meet the reading needs of visually impaired individuals due to image type diversity and limitations in image recognition technology. The primary goal of this paper is to enable the visually impaired to gain a greater understanding of the natural sciences by transforming images of mathematical functions into an electronic format for the production of tactile graphics. In an effort to enhance the accuracy and efficiency of graph element recognition and segmentation of function graphs, this paper proposes an MA Mask R-CNN model which utilizes MA ConvNeXt as its improved feature extraction backbone network and MA BiFPN as its improved feature fusion network. The MA ConvNeXt is a novel feature extraction network proposed in this paper, while the MA BiFPN is a novel feature fusion network introduced in this paper. This model combines the information of local relations, global relations and different channels to form an attention mechanism that is able to establish multiple connections, thus increasing the detection capability of the original Mask R-CNN model on slender and multi-type targets by combining a variety of multi-scale features. Finally, the experimental results show that MA Mask R-CNN attains an 89.6% mAP value for target detection and 72.3% mAP value for target segmentation in the instance segmentation of function graphs. This results in a 9% mAP improvement for target detection and 12.8% mAP improvement for target segmentation compared to the original Mask R-CNN.

摘要

全球约有22亿人有不同程度的视力障碍。其中,严重视力障碍者主要依靠听觉和触觉来获取外部信息。目前,可供视障人士阅读的材料有限,大多为音频或文本形式,无法满足视障人士理解图形内容的需求。尽管许多学者致力于研究将视觉图像转换为触觉图形的方法,但由于图像类型的多样性和图像识别技术的局限性,触觉图形翻译无法满足视障人士的阅读需求。本文的主要目标是通过将数学函数图像转换为电子格式以制作触觉图形,使视障人士更好地理解自然科学。为了提高函数图像的图元识别和分割的准确性和效率,本文提出了一种MA Mask R-CNN模型,该模型利用MA ConvNeXt作为改进的特征提取主干网络,MA BiFPN作为改进的特征融合网络。MA ConvNeXt是本文提出的一种新型特征提取网络,MA BiFPN是本文引入的一种新型特征融合网络。该模型结合了局部关系、全局关系和不同通道的信息,形成了一种能够建立多重连接的注意力机制,从而通过结合多种多尺度特征提高了原始Mask R-CNN模型对细长和多类型目标的检测能力。最后,实验结果表明,在函数图像的实例分割中,MA Mask R-CNN在目标检测方面的mAP值达到89.6%,在目标分割方面的mAP值达到72.3%。与原始Mask R-CNN相比,目标检测的mAP提高了9%,目标分割的mAP提高了12.8%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验