两阶段大语言模型方法增强放射学报告中的实体分类和关系映射

Two stage large language model approach enhancing entity classification and relationship mapping in radiology reports.

作者信息

Shin Chaiho, Eom Dareen, Lee Sang Min, Park Ji Eun, Kim Kwangsoo, Lee Kye Hwa

机构信息

Interdisciplinary Program of Medical Informatics, College of Medicine, Seoul National University, Seoul, Republic of Korea.

Department of Radiology and Research Institute of Radiology of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea.

出版信息

Sci Rep. 2025 Aug 27;15(1):31550. doi: 10.1038/s41598-025-16213-z.

DOI:10.1038/s41598-025-16213-z

PMID:40866412

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12391506/

Abstract

Large language models (LLMs) hold transformative potential for medical image labeling in radiology, addressing challenges posed by linguistic variability in reports. We developed a two-stage natural language processing pipeline that combines Bidirectional Encoder Representations from Transformers (BERT) and an LLM to analyze radiology reports. In the first stage (Entity Key Classification), BERT model identifies and classifies clinically relevant entities mentioned in the text. In the second stage (Relationship Mapping), the extracted entities are incorporated into the LLM to infer relationships between entity pairs, considering actual presence of entity. The pipeline targets lesion-location mapping in chest CT and diagnosis-episode mapping in brain MRI, both of which are clinically important for structuring radiologic findings and capturing temporal patterns of disease progression. Using over 400,000 reports from Seoul Asan Medical Center, our pipeline achieved a macro F1-score of 77.39 for chest CT and 70.58 for brain MRI. These results highlight the effectiveness of integrating BERT with an LLM to enhance diagnostic accuracy in radiology report analysis.

摘要

大语言模型（LLMs）在放射学的医学图像标注方面具有变革潜力，可应对报告中语言变异性带来的挑战。我们开发了一个两阶段的自然语言处理管道，该管道结合了来自Transformer的双向编码器表示（BERT）和一个大语言模型来分析放射学报告。在第一阶段（实体关键分类），BERT模型识别并分类文本中提到的临床相关实体。在第二阶段（关系映射），提取的实体被纳入大语言模型，以推断实体对之间的关系，同时考虑实体的实际存在情况。该管道的目标是胸部CT中的病变位置映射和脑部MRI中的诊断事件映射，这两者对于构建放射学发现和捕捉疾病进展的时间模式在临床上都很重要。使用首尔峨山医院超过40万份报告，我们的管道在胸部CT上的宏F1分数为77.39，在脑部MRI上为70.58。这些结果突出了将BERT与大语言模型集成以提高放射学报告分析诊断准确性的有效性。