ConBGAT：一种结合卷积神经网络、Transformer和图注意力网络的新型模型，用于从扫描图像中提取信息。

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image.

作者信息

Ho Vo Hoang Duy, Vo Quoc Huy, Hung Bui Thanh

机构信息

Data Science Laboratory/Data Science Department/Faculty of Information Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh, Vietnam.

出版信息

PeerJ Comput Sci. 2024 Nov 28;10:e2536. doi: 10.7717/peerj-cs.2536. eCollection 2024.

DOI:10.7717/peerj-cs.2536

PMID:39650481

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11622835/

Abstract

Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.

摘要

从扫描图像中提取信息是一项具有深远实际意义的关键任务。传统方法往往因无法充分利用图像和文本特征而有所不足，导致结果的准确性和效率较低。在本研究中，我们引入了ConBGAT，这是一种前沿模型，它无缝集成了卷积神经网络（CNN）、Transformer和图注意力网络来解决这些缺点。我们的方法从图像中的文本区域构建详细的图，利用先进的光学字符识别技术准确检测和解释字符。通过结合CNN用于图像的卓越提取特征和Transformer的蒸馏双向编码器表示（DistilBERT）用于文本，我们的模型实现了全面而高效的数据表示。在真实世界数据集上的严格测试表明，ConBGAT显著优于现有方法，在多个评估指标上展示了其卓越能力。这一进展不仅提高了准确性，还为扫描图像中的信息提取设定了新的基准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7a0/11622835/6c1003c65648/peerj-cs-10-2536-g001.jpg

相似文献

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image.ConBGAT：一种结合卷积神经网络、Transformer和图注意力网络的新型模型，用于从扫描图像中提取信息。

PeerJ Comput Sci. 2024 Nov 28;10:e2536. doi: 10.7717/peerj-cs.2536. eCollection 2024.

Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征，利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别：模型开发与验证

JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.

Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像（MRI）中进行脑肿瘤分割与检测

Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.

Enhanced Pneumonia Detection in Chest X-Rays Using Hybrid Convolutional and Vision Transformer Networks.使用混合卷积和视觉Transformer网络增强胸部X光片中的肺炎检测

Curr Med Imaging. 2025;21:e15734056326685. doi: 10.2174/0115734056326685250101113959.

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN（带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合）模型的医患对话多标签分类：命名实体研究

JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.

A dual-branch and dual attention transformer and CNN hybrid network for ultrasound image segmentation.一种用于超声图像分割的双分支双注意力Transformer与CNN混合网络。

Front Physiol. 2024 Sep 27;15:1432987. doi: 10.3389/fphys.2024.1432987. eCollection 2024.

CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking.CVTrack：用于视觉跟踪的卷积神经网络与视觉Transformer融合模型

Sensors (Basel). 2024 Jan 3;24(1):274. doi: 10.3390/s24010274.

A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。

BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.

HTC-retina: A hybrid retinal diseases classification model using transformer-Convolutional Neural Network from optical coherence tomography images.HTC-retina：一种使用来自光学相干断层扫描图像的变压器-卷积神经网络的混合视网膜疾病分类模型。

Comput Biol Med. 2024 Aug;178:108726. doi: 10.1016/j.compbiomed.2024.108726. Epub 2024 Jun 9.

G2ViT: Graph Neural Network-Guided Vision Transformer Enhanced Network for retinal vessel and coronary angiograph segmentation.G2ViT：基于图神经网络引导的视觉Transformer 增强网络，用于视网膜血管和冠状动脉造影分割。

Neural Netw. 2024 Aug;176:106356. doi: 10.1016/j.neunet.2024.106356. Epub 2024 May 3.

本文引用的文献

Graph convolutional networks: a comprehensive review.图卷积网络：全面综述。

Comput Soc Netw. 2019;6(1):11. doi: 10.1186/s40649-019-0069-y. Epub 2019 Nov 10.

Structure-Aware DropEdge Toward Deep Graph Convolutional Networks.面向深度图卷积网络的结构感知DropEdge

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):15565-15577. doi: 10.1109/TNNLS.2023.3288484. Epub 2024 Oct 29.

Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks.散射图卷积网络：克服图卷积网络中的过平滑问题

Adv Neural Inf Process Syst. 2020 Dec;33:14498-14508.

Improving the efficiency of RMSProp optimizer by utilizing Nestrove in deep learning.利用 Nestrove 提高深度学习中 RMSProp 优化器的效率。

Sci Rep. 2023 May 31;13(1):8814. doi: 10.1038/s41598-023-35663-x.

Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion.基于可微二值化和自适应尺度融合的实时场景文本检测

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):919-931. doi: 10.1109/TPAMI.2022.3155612. Epub 2022 Dec 5.

A Comprehensive Survey on Graph Neural Networks.图神经网络综述。

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24. doi: 10.1109/TNNLS.2020.2978386. Epub 2021 Jan 4.

Optimal Thresholding of Classifiers to Maximize F1 Measure.分类器的最优阈值设定以最大化F1度量

Mach Learn Knowl Discov Databases. 2014;8725:225-239. doi: 10.1007/978-3-662-44851-9_15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ConBGAT：一种结合卷积神经网络、Transformer和图注意力网络的新型模型，用于从扫描图像中提取信息。

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献