Chen Mei, Zhang Tingting, Wang Shibin
Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China.
School of Information Engineering, Minzu University of China, Beijing 100081, China.
PLoS One. 2025 Apr 8;20(4):e0320123. doi: 10.1371/journal.pone.0320123. eCollection 2025.
Given the scarcity of annotated data, current deep learning methods face challenges in the field of document-level chemical-disease relation extraction, making it difficult to achieve precise relation extraction capable of identifying relation types and comprehensive extraction tasks that identify relation-related factors. This study tests the abilities of three large language models (LLMs), GPT3.5, GPT4.0, and Claude-opus, to perform precise and comprehensive extraction in document-level chemical-disease relation extraction on a self-constructed dataset. Firstly, based on the task characteristics, this study designs six workflows for precise extraction and five workflows for comprehensive extraction using prompting engineering strategies. The characteristics of the extraction process are analyzed through the performance differences under different workflows. Secondly, this study analyzes the content bias in LLMs extraction by examining the extraction effectiveness of different workflows on different types of content. Finally, this study analyzes the error characteristics of extracting incorrect examples by the LLMs. The experimental results show that: (1) The LLMs demonstrate good extraction capabilities, achieving the highest F1 scores of 87% and 73% respectively in the tasks of precise extraction and comprehensive extraction; (2) In the extraction process, the LLMs exhibit a certain degree of stubbornness, with limited effectiveness of prompting engineering strategies; (3) In terms of extraction content, the LLMs show a content bias, with stronger abilities to identify positive relations such as induction and acceleration; (4) The essence of extraction errors lies in the LLMs' misunderstanding of the implicit meanings in biomedical texts. This study provides practical workflows for precise and comprehensive extraction of document-level chemical-disease relations and also indicates that optimizing training data is the key to building more efficient and accurate extraction methods in the future.
鉴于标注数据的稀缺性,当前的深度学习方法在文档级化学-疾病关系提取领域面临挑战,难以实现能够识别关系类型的精确关系提取以及识别关系相关因素的全面提取任务。本研究测试了三种大语言模型(LLMs),即GPT3.5、GPT4.0和Claude-opus,在自建数据集上进行文档级化学-疾病关系提取时进行精确和全面提取的能力。首先,基于任务特征,本研究使用提示工程策略设计了六种精确提取工作流程和五种全面提取工作流程。通过不同工作流程下的性能差异分析提取过程的特点。其次,本研究通过检查不同工作流程对不同类型内容的提取效果来分析大语言模型提取中的内容偏差。最后,本研究分析了大语言模型提取错误示例的错误特征。实验结果表明:(1)大语言模型展示出良好的提取能力,在精确提取和全面提取任务中分别达到了87%和73%的最高F1分数;(2)在提取过程中,大语言模型表现出一定程度的顽固性,提示工程策略的效果有限;(3)在提取内容方面,大语言模型表现出内容偏差,识别诱导和加速等正向关系的能力更强;(4)提取错误的本质在于大语言模型对生物医学文本中隐含意义的误解。本研究为文档级化学-疾病关系的精确和全面提取提供了实用的工作流程,同时也表明优化训练数据是未来构建更高效、准确提取方法的关键。