Suppr超能文献

ChunkUIE:基于分块指令的统一信息提取

ChunkUIE: Chunked instruction-based unified information extraction.

作者信息

Li Wei, Liu Yingzhen, Yang Yinling, Zhang Ting, Men Wei

机构信息

National Defense University, Beijing, China.

State Key Laboratory of Geo-Information Engineering, Beijing, China.

出版信息

PLoS One. 2025 Jun 27;20(6):e0326764. doi: 10.1371/journal.pone.0326764. eCollection 2025.

Abstract

Large language models (LLMs) have demonstrated remarkable performance across various linguistic tasks. However, existing LLMs perform inadequately in information extraction tasks for both Chinese and English. Numerous studies attempt to enhance model performance by increasing the scale of training data. However, discrepancies in the number and type of schemas used during training and evaluation can harm model effectiveness. To tackle this challenge, we propose ChunkUIE, a unified information extraction model that supports Chinese and English. We design a chunked instruction construction strategy that randomly and reproducibly divides all schemas into chunks containing an identical number of schemas. This approach ensures that the union of schemas across all chunks encompasses all schemas. By limiting the number of schemas in each instruction, this strategy effectively addresses the performance degradation caused by inconsistencies in schema counts between training and evaluation. Additionally, we construct some challenging negative schemas using a predefined hard schema dictionary, which mitigates the model's semantic confusion regarding similar schemas. Experimental results demonstrate that ChunkUIE enhances zero-shot performance in information extraction.

摘要

大语言模型(LLMs)在各种语言任务中都表现出了卓越的性能。然而,现有的大语言模型在中英文信息提取任务中表现不佳。许多研究试图通过增加训练数据规模来提高模型性能。然而,训练和评估过程中使用的模式数量和类型的差异可能会损害模型的有效性。为了应对这一挑战,我们提出了ChunkUIE,一种支持中文和英文的统一信息提取模型。我们设计了一种分块指令构建策略,该策略将所有模式随机且可重复地划分为包含相同数量模式的块。这种方法确保了所有块中模式的并集涵盖所有模式。通过限制每条指令中的模式数量,该策略有效地解决了训练和评估之间模式数量不一致导致的性能下降问题。此外,我们使用预定义的硬模式字典构建了一些具有挑战性的负模式,这减轻了模型对相似模式的语义混淆。实验结果表明,ChunkUIE提高了信息提取中的零样本性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验