Wang Xinxin, Chen Liang, Liu Changhong, Liu Jinyu
School of Economics and Management, Shangluo University, Shangluo, 726000, China.
The Shannxi Key Laboratory of Clothing Intelligence, Xi'an Polytechnic University, Xi'an, 710048, China.
Sci Rep. 2025 Jan 6;15(1):908. doi: 10.1038/s41598-024-83652-5.
To enhance enterprises' interactive exploration capabilities for unstructured chart data, this paper proposes a multimodal chart question-answering method. Facing the challenge of recognizing curved and irregular text in charts, we introduce Gaussian heatmap encoding technology to achieve character-level precise text annotation. Additionally, we combine a key point detection algorithm to extract numerical information from the charts and convert it into structured table data. Finally, by employing a multimodal cross-fusion model, we deeply integrate the queried charts, user questions, and generated table data to ensure that the model can comprehensively capture chart information and accurately answer user questions. Experimental validation has demonstrated that our method achieves a precision of 91.58% in chart information extraction and a chart question-answering accuracy of 82.24%, fully proving the significant advantages of our proposed method in enhancing chart text recognition and question-answering capabilities. Through practical enterprise application cases, our method has shown its ability to answer four types of chart questions, exhibiting mathematical reasoning capabilities and providing robust support for enterprise data analysis and decision-making.
为增强企业对非结构化图表数据的交互式探索能力,本文提出了一种多模态图表问答方法。面对图表中弯曲和不规则文本识别的挑战,我们引入高斯热图编码技术以实现字符级精确文本标注。此外,我们结合关键点检测算法从图表中提取数值信息并将其转换为结构化表格数据。最后,通过采用多模态交叉融合模型,我们将查询的图表、用户问题和生成的表格数据进行深度整合,以确保模型能够全面捕捉图表信息并准确回答用户问题。实验验证表明,我们的方法在图表信息提取中达到了91.58%的精度和82.24%的图表问答准确率,充分证明了我们提出的方法在增强图表文本识别和问答能力方面的显著优势。通过实际企业应用案例,我们的方法展示了回答四类图表问题的能力,展现出数学推理能力,并为企业数据分析和决策提供了有力支持。