Suppr超能文献

ConBGAT:一种结合卷积神经网络、Transformer和图注意力网络的新型模型,用于从扫描图像中提取信息。

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image.

作者信息

Ho Vo Hoang Duy, Vo Quoc Huy, Hung Bui Thanh

机构信息

Data Science Laboratory/Data Science Department/Faculty of Information Technology, Industrial University of Ho Chi Minh City, Ho Chi Minh, Vietnam.

出版信息

PeerJ Comput Sci. 2024 Nov 28;10:e2536. doi: 10.7717/peerj-cs.2536. eCollection 2024.

Abstract

Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.

摘要

从扫描图像中提取信息是一项具有深远实际意义的关键任务。传统方法往往因无法充分利用图像和文本特征而有所不足,导致结果的准确性和效率较低。在本研究中,我们引入了ConBGAT,这是一种前沿模型,它无缝集成了卷积神经网络(CNN)、Transformer和图注意力网络来解决这些缺点。我们的方法从图像中的文本区域构建详细的图,利用先进的光学字符识别技术准确检测和解释字符。通过结合CNN用于图像的卓越提取特征和Transformer的蒸馏双向编码器表示(DistilBERT)用于文本,我们的模型实现了全面而高效的数据表示。在真实世界数据集上的严格测试表明,ConBGAT显著优于现有方法,在多个评估指标上展示了其卓越能力。这一进展不仅提高了准确性,还为扫描图像中的信息提取设定了新的基准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7a0/11622835/6c1003c65648/peerj-cs-10-2536-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验