Su Yvonne, Babore Yonatan B, Kahn Charles E
Department of Radiology, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, 19104, PA, USA.
Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
J Imaging Inform Med. 2025 Jun;38(3):1297-1303. doi: 10.1007/s10278-024-01274-9. Epub 2024 Sep 25.
Natural language processing (NLP) is crucial to extract information accurately from unstructured text to provide insights for clinical decision-making, quality improvement, and medical research. This study compared the performance of a rule-based NLP system and a medical-domain transformer-based model to detect negated concepts in radiology reports. Using a corpus of 984 de-identified radiology reports from a large U.S.-based academic health system (1000 consecutive reports, excluding 16 duplicates), the investigators compared the rule-based medspaCy system and the Clinical Assertion and Negation Classification Bidirectional Encoder Representations from Transformers (CAN-BERT) system to detect negated expressions of terms from RadLex, the Unified Medical Language System Metathesaurus, and the Radiology Gamuts Ontology. Power analysis determined a sample size of 382 terms to achieve α = 0.05 and β = 0.8 for McNemar's test; based on an estimate of 15% negated terms, 2800 randomly selected terms were annotated manually as negated or not negated. Precision, recall, and F1 of the two models were compared using McNemar's test. Of the 2800 terms, 387 (13.8%) were negated. For negation detection, medspaCy attained a recall of 0.795, precision of 0.356, and F1 of 0.492. CAN-BERT achieved a recall of 0.785, precision of 0.768, and F1 of 0.777. Although recall was not significantly different, CAN-BERT had significantly better precision (χ2 = 304.64; p < 0.001). The transformer-based CAN-BERT model detected negated terms in radiology reports with high precision and recall; its precision significantly exceeded that of the rule-based medspaCy system. Use of this system will improve data extraction from textual reports to support information retrieval, AI model training, and discovery of causal relationships.
自然语言处理(NLP)对于从非结构化文本中准确提取信息以提供临床决策、质量改进和医学研究的见解至关重要。本研究比较了基于规则的NLP系统和基于医学领域变压器的模型在放射学报告中检测否定概念的性能。使用来自美国一家大型学术健康系统的984份去识别化放射学报告语料库(1000份连续报告,排除16份重复报告),研究人员比较了基于规则的medspaCy系统和来自变压器的临床断言与否定分类双向编码器表示(CAN-BERT)系统,以检测来自RadLex、统一医学语言系统叙词表和放射学色域本体中术语的否定表达。功效分析确定样本量为382个术语,以实现McNemar检验的α = 0.05和β = 0.8;基于对15%否定术语的估计,2800个随机选择的术语被手动标注为否定或非否定。使用McNemar检验比较了两个模型的精确率、召回率和F1值。在2800个术语中,387个(13.8%)被否定。对于否定检测,medspaCy的召回率为0.795,精确率为0.356,F1值为0.492。CAN-BERT的召回率为0.785,精确率为0.768,F1值为0.777。虽然召回率没有显著差异,但CAN-BERT的精确率显著更高(χ2 = 304.64;p < 0.001)。基于变压器的CAN-BERT模型在放射学报告中检测否定术语时具有高精度和召回率;其精确率显著超过基于规则的medspaCy系统。使用该系统将改善从文本报告中提取数据,以支持信息检索、人工智能模型训练和因果关系发现。