• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于多模态交叉融合的企业图谱问答方法

Enterprise chart question and answer method based on multi modal cross fusion.

作者信息

Wang Xinxin, Chen Liang, Liu Changhong, Liu Jinyu

机构信息

School of Economics and Management, Shangluo University, Shangluo, 726000, China.

The Shannxi Key Laboratory of Clothing Intelligence, Xi'an Polytechnic University, Xi'an, 710048, China.

出版信息

Sci Rep. 2025 Jan 6;15(1):908. doi: 10.1038/s41598-024-83652-5.

DOI:10.1038/s41598-024-83652-5
PMID:39762295
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11704073/
Abstract

To enhance enterprises' interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making.

摘要

为增强企业对非结构化图表数据的交互式探索能力,本文提出了一种多模态图表问答方法。面对图表中弯曲和不规则文本识别的挑战,我们引入高斯热图编码技术以实现字符级精确文本标注。此外,我们结合关键点检测算法从图表中提取数值信息并将其转换为结构化表格数据。最后,通过采用多模态交叉融合模型,我们将查询的图表、用户问题和生成的表格数据进行深度整合,以确保模型能够全面捕捉图表信息并准确回答用户问题。实验验证表明,我们的方法在图表信息提取中达到了91.58%的精度和82.24%的图表问答准确率,充分证明了我们提出的方法在增强图表文本识别和问答能力方面的显著优势。通过实际企业应用案例,我们的方法展示了回答四类图表问题的能力,展现出数学推理能力,并为企业数据分析和决策提供了有力支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/731cbf61883a/41598_2024_83652_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/6e7eb2a3d1ce/41598_2024_83652_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/4069696252ca/41598_2024_83652_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/4891ee301e62/41598_2024_83652_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/e025b5ad4930/41598_2024_83652_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/944ec9db78e2/41598_2024_83652_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/b4d2a9ca37ea/41598_2024_83652_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/9673cd5dbe6d/41598_2024_83652_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/0a054dc8fe9b/41598_2024_83652_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/731cbf61883a/41598_2024_83652_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/6e7eb2a3d1ce/41598_2024_83652_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/4069696252ca/41598_2024_83652_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/4891ee301e62/41598_2024_83652_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/e025b5ad4930/41598_2024_83652_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/944ec9db78e2/41598_2024_83652_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/b4d2a9ca37ea/41598_2024_83652_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/9673cd5dbe6d/41598_2024_83652_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/0a054dc8fe9b/41598_2024_83652_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9829/11704073/731cbf61883a/41598_2024_83652_Fig9_HTML.jpg

相似文献

1
Enterprise chart question and answer method based on multi modal cross fusion.基于多模态交叉融合的企业图谱问答方法
Sci Rep. 2025 Jan 6;15(1):908. doi: 10.1038/s41598-024-83652-5.
2
MMAgentRec, a personalized multi-modal recommendation agent with large language model.MMAgentRec,一个带有大语言模型的个性化多模态推荐代理。
Sci Rep. 2025 Apr 8;15(1):12062. doi: 10.1038/s41598-025-96458-w.
3
Text Matching in Insurance Question-Answering Community Based on an Integrated BiLSTM-TextCNN Model Fusing Multi-Feature.基于融合多特征的集成双向长短期记忆网络-文本卷积神经网络模型的保险问答社区文本匹配
Entropy (Basel). 2023 Apr 10;25(4):639. doi: 10.3390/e25040639.
4
Multi-modal adaptive gated mechanism for visual question answering.多模态自适应门控机制的视觉问答。
PLoS One. 2023 Jun 28;18(6):e0287557. doi: 10.1371/journal.pone.0287557. eCollection 2023.
5
BPI-MVQA: a bi-branch model for medical visual question answering.BPI-MVQA:一种用于医学视觉问答的双分支模型。
BMC Med Imaging. 2022 Apr 29;22(1):79. doi: 10.1186/s12880-022-00800-x.
6
The multi-modal fusion in visual question answering: a review of attention mechanisms.视觉问答中的多模态融合:注意力机制综述
PeerJ Comput Sci. 2023 May 30;9:e1400. doi: 10.7717/peerj-cs.1400. eCollection 2023.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
An effective spatial relational reasoning networks for visual question answering.用于视觉问答的有效的空间关系推理网络。
PLoS One. 2022 Nov 28;17(11):e0277693. doi: 10.1371/journal.pone.0277693. eCollection 2022.
9
Structured Multimodal Attentions for TextVQA.面向文本视觉问答的结构化多模态注意力
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):9603-9614. doi: 10.1109/TPAMI.2021.3132034. Epub 2022 Nov 7.
10
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.

本文引用的文献

1
LADV: Deep Learning Assisted Authoring of Dashboard Visualizations From Images and Sketches.LADV:基于图像和草图的仪表盘可视化深度学习辅助创作
IEEE Trans Vis Comput Graph. 2021 Sep;27(9):3717-3732. doi: 10.1109/TVCG.2020.2980227. Epub 2021 Jul 29.