用于非标准化表格的大语言模型驱动的可转移关键信息提取机制

Large language model driven transferable key information extraction mechanism for nonstandardized tables.

作者信息

Hu Rong, Yang Ye, Liu Sen, Li Zuchen, Liu Jingyi, Ding Xingchen, Sun Hanchi, Ren Lingli

机构信息

Customs and Public Management College, Shanghai Customs University, Shanghai, 201204, China.

School of Electronic Information, Shanghai DianJi University, Shanghai, 201306, China.

出版信息

Sci Rep. 2025 Aug 14;15(1):29802. doi: 10.1038/s41598-025-15627-z.

DOI:10.1038/s41598-025-15627-z

PMID:40813619

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12354842/

Abstract

Extracting key information from unstructured tables poses significant challenges due to layout variability, dependence on large annotated datasets, and inability of existing methods to directly output structured formats like JSON. These limitations hinder scalability and generalization to unseen document formats. We propose the Large Language Model Driven Transferable Key Information Extraction Mechanism (LLM-TKIE), which employs text detection to identify relevant regions in document images, followed by text recognition to extract content. An LLM then performs semantic reasoning, including completeness verification and key information extraction, before organizing data into structured formats. Without fine-tuning, LLM-TKIE achieves an F1-score of 80.9 and tree edit distance-based accuracy of 88.85 on CORD, and an F1-score of 83.9 with 93.3 accuracy on SROIE, demonstrating robust generalization and structural precision. Notably, our method significantly outperforms state-of-the-art multimodal large models on unlabeled customs domain datasets by 5-8% in accuracy. Additionally, our evaluation of multiple large language models of various sizes across 15 quantization strategies provides valuable insights for selecting and optimizing LLMs for key information extraction tasks, offering practical guidance for system development.

摘要

由于布局的可变性、对大量标注数据集的依赖以及现有方法无法直接输出如JSON等结构化格式，从非结构化表格中提取关键信息面临重大挑战。这些限制阻碍了可扩展性以及对未见文档格式的泛化能力。我们提出了大语言模型驱动的可转移关键信息提取机制（LLM-TKIE），该机制利用文本检测来识别文档图像中的相关区域，随后通过文本识别来提取内容。然后，一个大语言模型进行语义推理，包括完整性验证和关键信息提取，再将数据组织成结构化格式。无需微调，LLM-TKIE在CORD数据集上的F1分数达到80.9，基于树编辑距离的准确率达到88.85，在SROIE数据集上的F1分数为83.9，准确率为93.3，展示了强大的泛化能力和结构精度。值得注意的是，我们的方法在未标记的海关领域数据集上的准确率比最先进的多模态大模型显著高出5-8%。此外，我们对15种量化策略下不同规模的多个大语言模型进行的评估，为关键信息提取任务选择和优化大语言模型提供了有价值的见解，为系统开发提供了实用指导。